Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 74
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34913057

RESUMO

Single-cell RNA sequencing (scRNA-seq) allows quantitative analysis of gene expression at the level of single cells, beneficial to study cell heterogeneity. The recognition of cell types facilitates the construction of cell atlas in complex tissues or organisms, which is the basis of almost all downstream scRNA-seq data analyses. Using disease-related scRNA-seq data to perform the prediction of disease status can facilitate the specific diagnosis and personalized treatment of disease. Since single-cell gene expression data are high-dimensional and sparse with dropouts, we propose scIAE, an integrative autoencoder-based ensemble classification framework, to firstly perform multiple random projections and apply integrative and devisable autoencoders (integrating stacked, denoising and sparse autoencoders) to obtain compressed representations. Then base classifiers are built on the lower-dimensional representations and the predictions from all base models are integrated. The comparison of scIAE and common feature extraction methods shows that scIAE is effective and robust, independent of the choice of dimension, which is beneficial to subsequent cell classification. By testing scIAE on different types of data and comparing it with existing general and single-cell-specific classification methods, it is proven that scIAE has a great classification power in cell type annotation intradataset, across batches, across platforms and across species, and also disease status prediction. The architecture of scIAE is flexible and devisable, and it is available at https://github.com/JGuan-lab/scIAE.


Assuntos
Análise de Dados , Análise de Célula Única , Perfilação da Expressão Gênica , RNA-Seq , Análise de Sequência de RNA , Análise de Célula Única/métodos , Sequenciamento do Exoma
2.
Plant Physiol ; 191(4): 2570-2587, 2023 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-36682816

RESUMO

High-salt stress continues to challenge the growth and survival of many plants. Alternative polyadenylation (APA) produces mRNAs with different 3'-untranslated regions (3' UTRs) to regulate gene expression at the post-transcriptional level. However, the roles of alternative 3' UTRs in response to salt stress remain elusive. Here, we report the function of alternative 3' UTRs in response to high-salt stress in S. alterniflora (Spartina alterniflora), a monocotyledonous halophyte tolerant of high-salt environments. We found that high-salt stress induced global APA dynamics, and ∼42% of APA genes responded to salt stress. High-salt stress led to 3' UTR lengthening of 207 transcripts through increasing the usage of distal poly(A) sites. Transcripts with alternative 3' UTRs were mainly enriched in salt stress-related ion transporters. Alternative 3' UTRs of HIGH-AFFINITY K+ TRANSPORTER 1 (SaHKT1) increased RNA stability and protein synthesis in vivo. Regulatory AU-rich elements were identified in alternative 3' UTRs, boosting the protein level of SaHKT1. RNAi-knock-down experiments revealed that the biogenesis of 3' UTR lengthening in SaHKT1 was controlled by the poly(A) factor CLEAVAGE AND POLYADENYLATION SPECIFICITY FACTOR 30 (SaCPSF30). Over-expression of SaHKT1 with an alternative 3' UTR in rice (Oryza sativa) protoplasts increased mRNA accumulation of salt-tolerance genes in an AU-rich element-dependent manner. These results suggest that mRNA 3' UTR lengthening is a potential mechanism in response to high-salt stress. These results also reveal complex regulatory roles of alternative 3' UTRs coupling APA and regulatory elements at the post-transcriptional level in plants.


Assuntos
Oryza , Tolerância ao Sal , Regiões 3' não Traduzidas/genética , Tolerância ao Sal/genética , Poaceae/genética , Oryza/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Poliadenilação/genética
3.
Nucleic Acids Res ; 50(D1): D365-D370, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34508354

RESUMO

Alternative polyadenylation (APA) is a widespread regulatory mechanism of transcript diversification in eukaryotes, which is increasingly recognized as an important layer for eukaryotic gene expression. Recent studies based on single-cell RNA-seq (scRNA-seq) have revealed cell-to-cell heterogeneity in APA usage and APA dynamics across different cell types in various tissues, biological processes and diseases. However, currently available APA databases were all collected from bulk 3'-seq and/or RNA-seq data, and no existing database has provided APA information at single-cell resolution. Here, we present a user-friendly database called scAPAdb (http://www.bmibig.cn/scAPAdb), which provides a comprehensive and manually curated atlas of poly(A) sites, APA events and poly(A) signals at the single-cell level. Currently, scAPAdb collects APA information from > 360 scRNA-seq experiments, covering six species including human, mouse and several other plant species. scAPAdb also provides batch download of data, and users can query the database through a variety of keywords such as gene identifier, gene function and accession number. scAPAdb would be a valuable and extendable resource for the study of cell-to-cell heterogeneity in APA isoform usages and APA-mediated gene regulation at the single-cell level under diverse cell types, tissues and species.


Assuntos
Regiões 3' não Traduzidas , Bases de Dados Genéticas , Poliadenilação , RNA Mensageiro/genética , Proteínas de Ligação a RNA/genética , Interface Usuário-Computador , Animais , Atlas como Assunto , Sítios de Ligação , Linhagem da Célula/genética , Chlamydomonas reinhardtii/genética , Chlamydomonas reinhardtii/metabolismo , Células Eucarióticas/citologia , Células Eucarióticas/metabolismo , Humanos , Internet , Camundongos , MicroRNAs/classificação , MicroRNAs/genética , MicroRNAs/metabolismo , Especificidade de Órgãos , Plantas/genética , Plantas/metabolismo , Ligação Proteica , RNA Mensageiro/classificação , RNA Mensageiro/metabolismo , Proteínas de Ligação a RNA/classificação , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
4.
BMC Bioinformatics ; 24(1): 142, 2023 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-37041460

RESUMO

BACKGROUND: Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder that is highly phenotypically and genetically heterogeneous. With the accumulation of biological sequencing data, more and more studies shift to molecular subtype-first approach, from identifying molecular subtypes based on genetic and molecular data to linking molecular subtypes with clinical manifestation, which can reduce heterogeneity before phenotypic profiling. RESULTS: In this study, we perform similarity network fusion to integrate gene and gene set expression data of multiple human brain cell types for ASD molecular subtype identification. Then we apply subtype-specific differential gene and gene set expression analyses to study expression patterns specific to molecular subtypes in each cell type. To demonstrate the biological and practical significance, we analyze the molecular subtypes, investigate their correlation with ASD clinical phenotype, and construct ASD molecular subtype prediction models. CONCLUSIONS: The identified molecular subtype-specific gene and gene set expression may be used to differentiate ASD molecular subtypes, facilitating the diagnosis and treatment of ASD. Our method provides an analytical pipeline for the identification of molecular subtypes and even disease subtypes of complex disorders.


Assuntos
Transtorno do Espectro Autista , Transtorno Autístico , Humanos , Transtorno Autístico/genética , Transtorno do Espectro Autista/genética , Encéfalo/metabolismo
5.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33142319

RESUMO

Alternative polyadenylation (APA) generates diverse mRNA isoforms, which contributes to transcriptome diversity and gene expression regulation by affecting mRNA stability, translation and localization in cells. The rapid development of 3' tag-based single-cell RNA-sequencing (scRNA-seq) technologies, such as CEL-seq and 10x Genomics, has led to the emergence of computational methods for identifying APA sites and profiling APA dynamics at single-cell resolution. However, existing methods fail to detect the precise location of poly(A) sites or sites with low read coverage. Moreover, they rely on priori genome annotation and can only detect poly(A) sites located within or near annotated genes. Here we proposed a tool called scAPAtrap for detecting poly(A) sites at the whole genome level in individual cells from 3' tag-based scRNA-seq data. scAPAtrap incorporates peak identification and poly(A) read anchoring, enabling the identification of the precise location of poly(A) sites, even for sites with low read coverage. Moreover, scAPAtrap can identify poly(A) sites without using priori genome annotation, which helps locate novel poly(A) sites in previously overlooked regions and improve genome annotation. We compared scAPAtrap with two latest methods, scAPA and Sierra, using scRNA-seq data from different experimental technologies and species. Results show that scAPAtrap identified poly(A) sites with higher accuracy and sensitivity than competing methods and could be used to explore APA dynamics among cell types or the heterogeneous APA isoform expression in individual cells. scAPAtrap is available at https://github.com/BMILAB/scAPAtrap.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma , Sinais de Poliadenilação na Ponta 3' do RNA , RNA-Seq , Análise de Célula Única , Software , Anotação de Sequência Molecular
6.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34255024

RESUMO

The dynamic choice of different polyadenylation sites in a gene is referred to as alternative polyadenylation, which functions in many important biological processes. Large-scale messenger RNA 3' end sequencing has revealed that cleavage sites for polyadenylation are presented with microheterogeneity. To date, the conventional determination of polyadenylation site clusters is subjective and arbitrary, leading to inaccurate annotations. Here, we present a weighted density peak clustering method, QuantifyPoly(A), to accurately quantify genome-wide polyadenylation choices. Applying QuantifyPoly(A) on published 3' end sequencing datasets from both animals and plants, their polyadenylation profiles are reshaped into myriads of novel polyadenylation site clusters. Most of these novel polyadenylation site clusters show significantly dynamic usage across different biological samples or associate with binding sites of trans-acting factors. Upstream sequences of these clusters are enriched with polyadenylation signals UGUA, UAAA and/or AAUAAA in a species-dependent manner. Polyadenylation site clusters also exhibit species specificity, while plants ones generally show higher microheterogeneity than that of animals. QuantifyPoly(A) is broadly applicable to any types of 3' end sequencing data and species for accurate quantification and construction of the complex and dynamic polyadenylation landscape and enables us to decode alternative polyadenylation events invisible to conventional methods at a much higher resolution.


Assuntos
Poli A/metabolismo , Animais , Arabidopsis/metabolismo , Oryza/metabolismo , Poliadenilação
7.
BMC Genomics ; 23(1): 782, 2022 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-36451086

RESUMO

BACKGROUND: The identification of gene regulatory networks (GRNs) facilitates the understanding of the underlying molecular mechanism of various biological processes and complex diseases. With the availability of single-cell RNA sequencing data, it is essential to infer GRNs from single-cell expression. Although some GRN methods originally developed for bulk expression data can be applicable to single-cell data and several single-cell specific GRN algorithms were developed, recent benchmarking studies have emphasized the need of developing more accurate and robust GRN modeling methods that are compatible for single-cell expression data. RESULTS: We present SRGS, SPLS (sparse partial least squares)-based recursive gene selection, to infer GRNs from bulk or single-cell expression data. SRGS recursively selects and scores the genes which may have regulations on the considered target gene based on SPLS. When dealing with gene expression data with dropouts, we randomly scramble samples, set some values in the expression matrix to zeroes, and generate multiple copies of data through multiple iterations to make SRGS more robust. We test SRGS on different kinds of expression data, including simulated bulk data, simulated single-cell data without and with dropouts, and experimental single-cell data, and also compared with the existing GRN methods, including the ones originally developed for bulk data, the ones developed specifically for single-cell data, and even the ones recommended by recent benchmarking studies. CONCLUSIONS: It has been shown that SRGS is competitive with the existing GRN methods and effective in the gene regulatory network inference from bulk or single-cell gene expression data. SRGS is available at: https://github.com/JGuan-lab/SRGS .


Assuntos
Algoritmos , Redes Reguladoras de Genes , Análise dos Mínimos Quadrados , Benchmarking , Sequenciamento do Exoma
8.
Brief Bioinform ; 21(4): 1261-1276, 2020 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31267126

RESUMO

Alternative polyadenylation (APA) has been implicated to play an important role in post-transcriptional regulation by regulating mRNA abundance, stability, localization and translation, which contributes considerably to transcriptome diversity and gene expression regulation. RNA-seq has become a routine approach for transcriptome profiling, generating unprecedented data that could be used to identify and quantify APA site usage. A number of computational approaches for identifying APA sites and/or dynamic APA events from RNA-seq data have emerged in the literature, which provide valuable yet preliminary results that should be refined to yield credible guidelines for the scientific community. In this review, we provided a comprehensive overview of the status of currently available computational approaches. We also conducted objective benchmarking analysis using RNA-seq data sets from different species (human, mouse and Arabidopsis) and simulated data sets to present a systematic evaluation of 11 representative methods. Our benchmarking study showed that the overall performance of all tools investigated is moderate, reflecting that there is still lot of scope to improve the prediction of APA site or dynamic APA events from RNA-seq data. Particularly, prediction results from individual tools differ considerably, and only a limited number of predicted APA sites or genes are common among different tools. Accordingly, we attempted to give some advice on how to assess the reliability of the obtained results. We also proposed practical recommendations on the appropriate method applicable to diverse scenarios and discussed implications and future directions relevant to profiling APA from RNA-seq data.


Assuntos
Análise de Sequência de RNA/métodos , Animais , Humanos , Poliadenilação
9.
Bioinformatics ; 37(16): 2470-2472, 2021 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-33258917

RESUMO

MOTIVATION: Alternative polyadenylation (APA) has been widely recognized as a widespread mechanism modulated dynamically. Studies based on 3' end sequencing and/or RNA-seq have profiled poly(A) sites in various species with diverse pipelines, yet no unified and easy-to-use toolkit is available for comprehensive APA analyses. RESULTS: We developed an R package called movAPA for modeling and visualization of dynamics of alternative polyadenylation across biological samples. movAPA incorporates rich functions for preprocessing, annotation and statistical analyses of poly(A) sites, identification of poly(A) signals, profiling of APA dynamics and visualization. Particularly, seven metrics are provided for measuring the tissue-specificity or usages of APA sites across samples. Three methods are used for identifying 3' UTR shortening/lengthening events between conditions. APA site switching involving non-3' UTR polyadenylation can also be explored. Using poly(A) site data from rice and mouse sperm cells, we demonstrated the high scalability and flexibility of movAPA in profiling APA dynamics across tissues and single cells. AVAILABILITY AND IMPLEMENTATION: https://github.com/BMILAB/movAPA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Oryza , Poliadenilação , Regiões 3' não Traduzidas , Animais , Camundongos , Oryza/genética , Poli A/metabolismo , RNA-Seq , Software
10.
Int J Mol Sci ; 23(15)2022 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-35897701

RESUMO

Alternative polyadenylation (APA) is a key layer of gene expression regulation, and APA choice is finely modulated in cells. Advances in single-cell RNA-seq (scRNA-seq) have provided unprecedented opportunities to study APA in cell populations. However, existing studies that investigated APA in single cells were either confined to a few cells or focused on profiling APA dynamics between cell types or identifying APA sites. The diversity and pattern of APA usages on a genomic scale in single cells remains unappreciated. Here, we proposed an analysis framework based on a Gaussian mixture model, scAPAmod, to identify patterns of APA usage from homogeneous or heterogeneous cell populations at the single-cell level. We systematically evaluated the performance of scAPAmod using simulated data and scRNA-seq data. The results show that scAPAmod can accurately identify different patterns of APA usages at the single-cell level. We analyzed the dynamic changes in the pattern of APA usage using scAPAmod in different cell differentiation and developmental stages during mouse spermatogenesis and found that even the same gene has different patterns of APA usages in different differentiation stages. The preference of patterns of usages of APA sites in different genomic regions was also analyzed. We found that patterns of APA usages of the same gene in 3' UTRs (3' untranslated region) and non-3' UTRs are different. Moreover, we analyzed cell-type-specific APA usage patterns and changes in patterns of APA usages across cell types. Different from the conventional analysis of single-cell heterogeneity based on gene expression profiling, this study profiled the heterogeneous pattern of APA isoforms, which contributes to revealing the heterogeneity of single-cell gene expression with higher resolution.


Assuntos
Perfilação da Expressão Gênica , Poliadenilação , Regiões 3' não Traduzidas , Animais , Camundongos , Poliadenilação/genética , RNA-Seq , Análise de Sequência de RNA/métodos
11.
Bioinformatics ; 36(3): 789-797, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31392316

RESUMO

MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) is fast and becoming a powerful technique for studying dynamic gene regulation at unprecedented resolution. However, scRNA-seq data suffer from problems of extremely high dropout rate and cell-to-cell variability, demanding new methods to recover gene expression loss. Despite the availability of various dropout imputation approaches for scRNA-seq, most studies focus on data with a medium or large number of cells, while few studies have explicitly investigated the differential performance across different sample sizes or the applicability of the approach on small or imbalanced data. It is imperative to develop new imputation approaches with higher generalizability for data with various sample sizes. RESULTS: We proposed a method called scHinter for imputing dropout events for scRNA-seq with special emphasis on data with limited sample size. scHinter incorporates a voting-based ensemble distance and leverages the synthetic minority oversampling technique for random interpolation. A hierarchical framework is also embedded in scHinter to increase the reliability of the imputation for small samples. We demonstrated the ability of scHinter to recover gene expression measurements across a wide spectrum of scRNA-seq datasets with varied sample sizes. We comprehensively examined the impact of sample size and cluster number on imputation. Comprehensive evaluation of scHinter across diverse scRNA-seq datasets with imbalanced or limited sample size showed that scHinter achieved higher and more robust performance than competing approaches, including MAGIC, scImpute, SAVER and netSmooth. AVAILABILITY AND IMPLEMENTATION: Freely available for download at https://github.com/BMILAB/scHinter. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , RNA-Seq , Reprodutibilidade dos Testes , Tamanho da Amostra , Análise de Sequência de RNA , Análise de Célula Única , Software
12.
Bioinformatics ; 36(4): 1262-1264, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31557285

RESUMO

MOTIVATION: Alternative polyadenylation (APA) plays a key post-transcriptional regulatory role in mRNA stability and functions in eukaryotes. Single cell RNA-seq (scRNA-seq) is a powerful tool to discover cellular heterogeneity at gene expression level. Given 3' enriched strategy in library construction, the most commonly used scRNA-seq protocol-10× Genomics enables us to improve the study resolution of APA to the single cell level. However, currently there is no computational tool available for investigating APA profiles from scRNA-seq data. RESULTS: Here, we present a package scDAPA for detecting and visualizing dynamic APA from scRNA-seq data. Taking bam/sam files and cell cluster labels as inputs, scDAPA detects APA dynamics using a histogram-based method and the Wilcoxon rank-sum test, and visualizes candidate genes with dynamic APA. Benchmarking results demonstrated that scDAPA can effectively identify genes with dynamic APA among different cell groups from scRNA-seq data. AVAILABILITY AND IMPLEMENTATION: The scDAPA package is implemented in Shell and R, and is freely available at https://scdapa.sourceforge.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Poliadenilação , RNA-Seq , Perfilação da Expressão Gênica , Análise de Sequência de RNA , Análise de Célula Única , Software
13.
J Transl Med ; 19(1): 20, 2021 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-33407556

RESUMO

BACKGROUND: Genome-wide association studies have identified genetic variants associated with the risk of brain-related diseases, such as neurological and psychiatric disorders, while the causal variants and the specific vulnerable cell types are often needed to be studied. Many disease-associated genes are expressed in multiple cell types of human brains, while the pathologic variants affect primarily specific cell types. We hypothesize a model in which what determines the manifestation of a disease in a cell type is the presence of disease module comprised of disease-associated genes, instead of individual genes. Therefore, it is essential to identify the presence/absence of disease gene modules in cells. METHODS: To characterize the cell type-specificity of brain-related diseases, we construct human brain cell type-specific gene interaction networks integrating human brain nucleus gene expression data with a referenced tissue-specific gene interaction network. Then from the cell type-specific gene interaction networks, we identify significant cell type-specific disease gene modules by performing statistical tests. RESULTS: Between neurons and glia cells, the constructed cell type-specific gene networks and their gene functions are distinct. Then we identify cell type-specific disease gene modules associated with autism spectrum disorder and find that different gene modules are formed and distinct gene functions may be dysregulated in different cells. We also study the similarity and dissimilarity in cell type-specific disease gene modules among autism spectrum disorder, schizophrenia and bipolar disorder. The functions of neurons-specific disease gene modules are associated with synapse for all three diseases, while those in glia cells are different. To facilitate the use of our method, we develop an R package, CtsDGM, for the identification of cell type-specific disease gene modules. CONCLUSIONS: The results support our hypothesis that a disease manifests itself in a cell type through forming a statistically significant disease gene module. The identification of cell type-specific disease gene modules can promote the development of more targeted biomarkers and treatments for the disease. Our method can be applied for depicting the cell type heterogeneity of a given disease, and also for studying the similarity and dissimilarity between different disorders, providing new insights into the molecular mechanisms underlying the pathogenesis and progression of diseases.


Assuntos
Transtorno do Espectro Autista , Redes Reguladoras de Genes , Transtorno do Espectro Autista/genética , Perfilação da Expressão Gênica , Estudo de Associação Genômica Ampla , Humanos , Fenótipo
14.
Plant Physiol ; 182(1): 228-242, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31767692

RESUMO

Alternative cleavage and polyadenylation (APA) is increasingly recognized as an important regulatory mechanism in eukaryotic gene expression and is dynamically modulated in a developmental, tissue-specific, or environmentally responsive manner. Given the functional importance of APA and the rapid accumulation of APA sites in plants, a comprehensive and easily accessible APA site database is necessary for improved understanding of APA-mediated gene expression regulation. We present a database called PlantAPAdb that catalogs the most comprehensive APA site data derived from sequences from diverse 3' sequencing protocols and biological samples in plants. Currently, PlantAPAdb contains APA sites in six species, Oryza sativa (japonica and indica), Arabidopsis (Arabidopsis thaliana), Medicago truncatula, Trifolium pratense, Phyllostachys edulis, and Chlamydomonas reinhardtii APA sites in PlantAPAdb are available for bulk download and can be queried in a Google-like manner. PlantAPAdb provides rich information of the whole-genome APA sites, including genomic locations, heterogeneous cleavage sites, expression levels, and sample information. It also provides comprehensive poly(A) signals for APA sites in different genomic regions according to distinct profiles of cis-elements in plants. In addition, PlantAPAdb contains events of 3' untranslated region shortening/lengthening resulting from APA, which helps to understand the mechanisms underlying systematic changes in 3' untranslated region lengths. Additional information about conservation of APA sites in plants is also available, providing insights into the evolutionary polyadenylation configuration across species. As a user-friendly database, PlantAPAdb is a large and extendable resource for elucidating APA mechanisms, APA conservation, and gene expression regulation.


Assuntos
Poli A/metabolismo , Poliadenilação/fisiologia , Arabidopsis/genética , Arabidopsis/metabolismo , Chlamydomonas reinhardtii/genética , Chlamydomonas reinhardtii/metabolismo , Genoma de Planta/genética , Medicago truncatula/genética , Medicago truncatula/metabolismo , Oryza/genética , Oryza/metabolismo , Poli A/genética , Poliadenilação/genética , Trifolium/genética , Trifolium/metabolismo
15.
J Biomed Inform ; 122: 103899, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34481921

RESUMO

Single-cell RNA sequencing (scRNA-seq) is fast becoming a powerful technology that revolutionizes biomedical studies related to development, immunology and cancer by providing genome-scale transcriptional profiles at unprecedented throughput and resolution. However, due to the low capture rate and frequent drop-out events in the sequencing process, scRNA-seq data suffer from extremely high sparsity and variability, challenging the data analysis. Here we proposed a novel method called scLINE for learning low dimensional representations of scRNA-seq data. scLINE is based on the network embedding model that jointly considers multiple gene-gene interaction networks, facilitating the incorporation of prior biological knowledge for signal extraction. We comprehensively evaluated scLINE on eight single-cell datasets. Results show that scLINE achieved comparable or higher performance than competing methods, including PCA, t-SNE and Isomap, in terms of internal validation metrics and clustering accuracy. The low dimensional representations learned by scLINE are effective for downstream single-cell analysis, such as visualization, clustering and cell typing. We have implemented scLINE as an easy-to-use R package, which can be incorporated in other existing scRNA-seq analysis pipelines or tools for data preprocessing.


Assuntos
Redes Reguladoras de Genes , Análise de Célula Única , Análise por Conglomerados , Perfilação da Expressão Gênica , RNA-Seq , Análise de Sequência de RNA
16.
Plant J ; 99(1): 67-80, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30844106

RESUMO

The post-transcriptional regulation involved in the responses of diatoms to silicon is poorly understood. Using a poly(A)-tag sequencing (PAT-seq) technique that interrogates only the junctions of 3'-untranslated region (UTR) and the poly(A) tails at the transcriptome level, a comprehensive comparison of alternative polyadenylation (APA) was performed to understand the role of post-transcriptional regulation in various silicon-related cellular responses for the marine diatom Thalassiosira pseudonana. In total, 23 701 poly(A) clusters and 6894 APA genes, treated with silicon starvation and replenishment, were identified at nine time points. Significant APA was found in numerous genes (e.g. five cingulin genes) closely associated with the silicon-starvation response, girdle bands and valve synthesis, suggesting that many genes participated in the responses to silicon availability and biosilica formation through changes in transcript isoforms. The poly(A) site usage profiles were distinct during various stages of silicon biomineralization responses. Moreover, a correlation between APA and expression levels of APA switching genes was also discovered. This is an interesting study that presents a genome-wide profile of transcript ends in diatoms, which is distinct from that of higher plants, animals and other microalgae. This work provides an important resource to understand a different aspect of cell-wall synthesis.


Assuntos
Diatomáceas/metabolismo , Silício/metabolismo , Diatomáceas/genética , Genoma de Planta/genética , Poliadenilação
17.
Plant J ; 98(2): 260-276, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30570805

RESUMO

Alternative polyadenylation (APA) is a widespread post-transcriptional mechanism that regulates gene expression through mRNA metabolism, playing a pivotal role in modulating phenotypic traits in rice (Oryza sativa L.). However, little is known about the APA-mediated regulation underlying the distinct characteristics between two major rice subspecies, indica and japonica. Using a poly(A)-tag sequencing approach, polyadenylation (poly(A)) site profiles were investigated and compared pairwise from germination to the mature stage between indica and japonica, and extensive differentiation in APA profiles was detected genome-wide. Genes with subspecies-specific poly(A) sites were found to contribute to subspecies characteristics, particularly in disease resistance of indica and cold-stress tolerance of japonica. In most tissues, differential usage of APA sites exhibited an apparent impact on the gene expression profiles between subspecies, and genes with those APA sites were significantly enriched in quantitative trait loci (QTL) related to yield traits, such as spikelet number and 1000-seed weight. In leaves of the booting stage, APA site-switching genes displayed global shortening of 3' untranslated regions with increased expression in indica compared with japonica, and they were overrepresented in the porphyrin and chlorophyll metabolism pathways. This phenomenon may lead to a higher chlorophyll content and photosynthesis in indica than in japonica, being associated with their differential growth rates and yield potentials. We further constructed an online resource for querying and visualizing the poly(A) atlas in these two rice subspecies. Our results suggest that APA may be largely involved in developmental differentiations between two rice subspecies, especially in leaf characteristics and the stress response, broadening our knowledge of the post-transcriptional genetic basis underlying the divergence of rice traits.


Assuntos
Genes de Plantas/genética , Oryza/genética , Oryza/metabolismo , Poliadenilação , Aclimatação , Clorofila/metabolismo , Regulação da Expressão Gênica de Plantas , Germinação , Fenótipo , Fotossíntese , Folhas de Planta/genética , Folhas de Planta/metabolismo , Locos de Características Quantitativas , Sementes , Estresse Fisiológico , Transcriptoma
18.
Plant Cell Physiol ; 61(5): 882-896, 2020 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-32044993

RESUMO

Spartina alterniflora (Spartina) is the only halophyte in the salt marsh. However, the molecular basis of its high salt tolerance remains elusive. In this study, we used Pacific Biosciences (PacBio) full-length single-molecule long-read sequencing and RNA-seq to elucidate the transcriptome dynamics of high salt tolerance in Spartina by salt gradient experiments. High-quality unigenes, transcription factors, non-coding RNA and Spartina-specific transcripts were identified. Co-expression network analysis found that protein kinase-encoding genes (SaOST1, SaCIPK10 and SaLRRs) are hub genes in the salt tolerance regulatory network. High salt stress induced the expression of transcription factors but repressed the expression of long non-coding RNAs. The Spartina transcriptome is closer to rice than Arabidopsis, and a higher proportion of transporter and transcription factor-encoding transcripts have been found in Spartina. Transcriptome analysis showed that high salt stress induced the expression of carbohydrate metabolism, especially cell-wall biosynthesis-related genes in Spartina, and repressed its expression in rice. Compared with rice, high salt stress highly induced the expression of stress response, protein modification and redox-related gene expression and greatly inhibited translation in Spartina. High salt stress also induced alternative splicing in Spartina, while differentially expressed alternative splicing events associated with photosynthesis were overrepresented in Spartina but not in rice. Finally, we built the SAPacBio website for visualizing full-length transcriptome sequences, transcription factors, ncRNAs, salt-tolerant genes and alternative splicing events in Spartina. Overall, this study suggests that the salt tolerance mechanism in Spartina is different from rice in many aspects and is far more complex than expected.


Assuntos
Poaceae/genética , Poaceae/fisiologia , Tolerância ao Sal/genética , Plantas Tolerantes a Sal/genética , Transcriptoma/genética , Processamento Alternativo/genética , Arabidopsis/genética , Regulação da Expressão Gênica de Plantas , Ontologia Genética , Redes Reguladoras de Genes , Genes de Plantas , Oryza/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , RNA não Traduzido/genética , RNA não Traduzido/metabolismo , Estresse Fisiológico/genética , Fatores de Transcrição/metabolismo
19.
Bioinformatics ; 35(15): 2654-2656, 2019 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-30535139

RESUMO

SUMMARY: Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. AVAILABILITY AND IMPLEMENTATION: AStrap is available for download at https://github.com/BMILAB/AStrap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento Alternativo , Genoma , Humanos , Aprendizado de Máquina , Análise de Sequência de RNA , Transcriptoma
20.
Cell Mol Life Sci ; 76(11): 2185-2198, 2019 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-30729254

RESUMO

RNA alternative polyadenylation contributes to the complexity of information transfer from genome to phenome, thus amplifying gene function. Here, we report the first X. tropicalis resource with 127,914 alternative polyadenylation (APA) sites derived from embryos and adults. Overall, APA networks play central roles in coordinating the maternal-zygotic transition (MZT) in embryos, sexual dimorphism in adults and longitudinal growth from embryos to adults. APA sites coordinate reprogramming in embryos before the MZT, but developmental events after the MZT due to zygotic genome activation. The APA transcriptomes of young adults are more variable than growing adults and male frog APA transcriptomes are more divergent than females. The APA profiles of young females were similar to embryos before the MZT. Enriched pathways in developing embryos were distinct across the MZT and noticeably segregated from adults. Briefly, our results suggest that the minimal functional units in genomes are alternative transcripts as opposed to genes.


Assuntos
Proteínas de Anfíbios/genética , Genoma , RNA Mensageiro/genética , Caracteres Sexuais , Transcriptoma , Xenopus/genética , Proteínas de Anfíbios/metabolismo , Animais , Embrião não Mamífero , Desenvolvimento Embrionário , Feminino , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Ontologia Genética , Masculino , Anotação de Sequência Molecular , Poliadenilação , RNA Mensageiro/metabolismo , Fatores Sexuais , Sequenciamento do Exoma , Xenopus/crescimento & desenvolvimento , Xenopus/metabolismo , Zigoto/crescimento & desenvolvimento , Zigoto/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA