Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
1.
Cell ; 187(9): 2336-2341.e5, 2024 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-38582080

RESUMO

The Genome Aggregation Database (gnomAD), widely recognized as the gold-standard reference map of human genetic variation, has largely overlooked tandem repeat (TR) expansions, despite the fact that TRs constitute ∼6% of our genome and are linked to over 50 human diseases. Here, we introduce the TR-gnomAD (https://wlcb.oit.uci.edu/TRgnomAD), a biobank-scale reference of 0.86 million TRs derived from 338,963 whole-genome sequencing (WGS) samples of diverse ancestries (39.5% non-European samples). TR-gnomAD offers critical insights into ancestry-specific disease prevalence using disparities in TR unit number frequencies among ancestries. Moreover, TR-gnomAD is able to differentiate between common, presumably benign TR expansions, which are prevalent in TR-gnomAD, from those potentially pathogenic TR expansions, which are found more frequently in disease groups than within TR-gnomAD. Together, TR-gnomAD is an invaluable resource for researchers and physicians to interpret TR expansions in individuals with genetic diseases.


Assuntos
Genoma Humano , Sequências de Repetição em Tandem , Humanos , Sequências de Repetição em Tandem/genética , Sequenciamento Completo do Genoma , Bases de Dados Genéticas , Expansão das Repetições de DNA/genética , Estudo de Associação Genômica Ampla
2.
Bioinformatics ; 40(3)2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38485700

RESUMO

MOTIVATION: Alternative polyadenylation (APA) is a widespread post-transcriptional regulatory mechanism across all eukaryotes. With the accumulation of genome-wide APA sites, especially those with single-cell resolution, it is imperative to develop easy-to-use visualization tools to guide APA analysis. RESULTS: We developed an R package called vizAPA for visualizing APA dynamics from bulk and single-cell data. vizAPA implements unified data structures for APA data and genome annotations. vizAPA also enables identification of genes with differential APA usage across biological samples and/or cell types. vizAPA provides four unique modules for extensively visualizing APA dynamics across biological samples and at the single-cell level. vizAPA could serve as a plugin in many routine APA analysis pipelines to augment studies for APA dynamics. AVAILABILITY AND IMPLEMENTATION: https://github.com/BMILAB/vizAPA.


Assuntos
Regulação da Expressão Gênica , Poliadenilação , Eucariotos , Regiões 3' não Traduzidas
3.
Bioinformatics ; 39(7)2023 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-37402621

RESUMO

SUMMARY: Cell-free methylated DNA immunoprecipitation and high-throughput sequencing (cfMeDIP-seq) has emerged as a promising liquid biopsy technology to detect cancers and monitor treatments. While several bioinformatics tools for DNA methylation analysis have been adapted for cfMeDIP-seq data, an end-to-end pipeline and quality control framework specifically for this data type is still lacking. Here, we present the MEDIPIPE, which provides a one-stop solution for cfMeDIP-seq data quality control, methylation quantification, and sample aggregation. The major advantages of MEDIPIPE are: (i) ease of implementation and reproducibility with Snakemake containerized execution environments that will be automatically deployed via Conda; (ii) flexibility to handle different experimental settings with a single configuration file; and (iii) computationally efficiency for large-scale cfMeDIP-seq profiling data analysis and aggregation. AVAILABILITY AND IMPLEMENTATION: This pipeline is an open-source software under the MIT license and it is freely available at https://github.com/pughlab/MEDIPIPE.


Assuntos
Ácidos Nucleicos Livres , Software , Reprodutibilidade dos Testes , Sequenciamento de Nucleotídeos em Larga Escala , Imunoprecipitação , Controle de Qualidade
4.
Plant Physiol ; 191(4): 2570-2587, 2023 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-36682816

RESUMO

High-salt stress continues to challenge the growth and survival of many plants. Alternative polyadenylation (APA) produces mRNAs with different 3'-untranslated regions (3' UTRs) to regulate gene expression at the post-transcriptional level. However, the roles of alternative 3' UTRs in response to salt stress remain elusive. Here, we report the function of alternative 3' UTRs in response to high-salt stress in S. alterniflora (Spartina alterniflora), a monocotyledonous halophyte tolerant of high-salt environments. We found that high-salt stress induced global APA dynamics, and ∼42% of APA genes responded to salt stress. High-salt stress led to 3' UTR lengthening of 207 transcripts through increasing the usage of distal poly(A) sites. Transcripts with alternative 3' UTRs were mainly enriched in salt stress-related ion transporters. Alternative 3' UTRs of HIGH-AFFINITY K+ TRANSPORTER 1 (SaHKT1) increased RNA stability and protein synthesis in vivo. Regulatory AU-rich elements were identified in alternative 3' UTRs, boosting the protein level of SaHKT1. RNAi-knock-down experiments revealed that the biogenesis of 3' UTR lengthening in SaHKT1 was controlled by the poly(A) factor CLEAVAGE AND POLYADENYLATION SPECIFICITY FACTOR 30 (SaCPSF30). Over-expression of SaHKT1 with an alternative 3' UTR in rice (Oryza sativa) protoplasts increased mRNA accumulation of salt-tolerance genes in an AU-rich element-dependent manner. These results suggest that mRNA 3' UTR lengthening is a potential mechanism in response to high-salt stress. These results also reveal complex regulatory roles of alternative 3' UTRs coupling APA and regulatory elements at the post-transcriptional level in plants.


Assuntos
Oryza , Tolerância ao Sal , Regiões 3' não Traduzidas/genética , Tolerância ao Sal/genética , Poaceae/genética , Oryza/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Poliadenilação/genética
5.
Nucleic Acids Res ; 50(D1): D365-D370, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34508354

RESUMO

Alternative polyadenylation (APA) is a widespread regulatory mechanism of transcript diversification in eukaryotes, which is increasingly recognized as an important layer for eukaryotic gene expression. Recent studies based on single-cell RNA-seq (scRNA-seq) have revealed cell-to-cell heterogeneity in APA usage and APA dynamics across different cell types in various tissues, biological processes and diseases. However, currently available APA databases were all collected from bulk 3'-seq and/or RNA-seq data, and no existing database has provided APA information at single-cell resolution. Here, we present a user-friendly database called scAPAdb (http://www.bmibig.cn/scAPAdb), which provides a comprehensive and manually curated atlas of poly(A) sites, APA events and poly(A) signals at the single-cell level. Currently, scAPAdb collects APA information from > 360 scRNA-seq experiments, covering six species including human, mouse and several other plant species. scAPAdb also provides batch download of data, and users can query the database through a variety of keywords such as gene identifier, gene function and accession number. scAPAdb would be a valuable and extendable resource for the study of cell-to-cell heterogeneity in APA isoform usages and APA-mediated gene regulation at the single-cell level under diverse cell types, tissues and species.


Assuntos
Regiões 3' não Traduzidas , Bases de Dados Genéticas , Poliadenilação , RNA Mensageiro/genética , Proteínas de Ligação a RNA/genética , Interface Usuário-Computador , Animais , Atlas como Assunto , Sítios de Ligação , Linhagem da Célula/genética , Chlamydomonas reinhardtii/genética , Chlamydomonas reinhardtii/metabolismo , Células Eucarióticas/citologia , Células Eucarióticas/metabolismo , Humanos , Internet , Camundongos , MicroRNAs/classificação , MicroRNAs/genética , MicroRNAs/metabolismo , Especificidade de Órgãos , Plantas/genética , Plantas/metabolismo , Ligação Proteica , RNA Mensageiro/classificação , RNA Mensageiro/metabolismo , Proteínas de Ligação a RNA/classificação , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
6.
BMC Pediatr ; 24(1): 193, 2024 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-38500150

RESUMO

Childhood obesity not only has a negative impact on a child's health but is also a significant risk factor for adult obesity and related metabolic disorders, making it a major global public health concern. Recent studies have revealed the crucial role of gut microbiota in the occurrence and development of obesity, in addition to genetic and lifestyle factors. In this study, we recruited 19 normal-weight children and 47 children with varying degrees of obesity. A questionnaire survey was conducted to inquire about the family background, lifestyle habits and dietary composition of the 66 children. Findings indicate that fathers of obese children tend to be obese themselves, while children with highly educated mothers are more likely to maintain a normal weight. Furthermore, overweight children tend to spend more time on electronic devices and less time on physical activities compared to their normal-weight counterparts. Obese children exhibit significant differences in breakfast and dinner dietary composition when compared to children with normal weight. Additionally, the gut microbiota of these 66 children was analyzed using 16S rRNA sequencing. Analysis of gut microbiota composition showed similar compositions among children with varying degrees of obesity, but significant differences were observed in comparison to normal-weight children. Obese children exhibited a reduced proportion of Bacteroidota and an increased proportion of Firmicutes, resulting in an elevated Firmicutes/Bacteroidota ratio. Moreover, Actinobacteriota were found to be increased in the gut microbiota of children with varying degrees of obesity. PICRUSt analysis indicated significant metabolic differences in the microbiota functions between obese and normal-weight children, suggesting the composition of gut microbiota could be a crucial factor contributing to obesity. These findings provide valuable insights for the treatment of childhood obesity.


Assuntos
Microbioma Gastrointestinal , Obesidade Infantil , Feminino , Adulto , Criança , Humanos , RNA Ribossômico 16S/genética , Dieta , China
7.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33142319

RESUMO

Alternative polyadenylation (APA) generates diverse mRNA isoforms, which contributes to transcriptome diversity and gene expression regulation by affecting mRNA stability, translation and localization in cells. The rapid development of 3' tag-based single-cell RNA-sequencing (scRNA-seq) technologies, such as CEL-seq and 10x Genomics, has led to the emergence of computational methods for identifying APA sites and profiling APA dynamics at single-cell resolution. However, existing methods fail to detect the precise location of poly(A) sites or sites with low read coverage. Moreover, they rely on priori genome annotation and can only detect poly(A) sites located within or near annotated genes. Here we proposed a tool called scAPAtrap for detecting poly(A) sites at the whole genome level in individual cells from 3' tag-based scRNA-seq data. scAPAtrap incorporates peak identification and poly(A) read anchoring, enabling the identification of the precise location of poly(A) sites, even for sites with low read coverage. Moreover, scAPAtrap can identify poly(A) sites without using priori genome annotation, which helps locate novel poly(A) sites in previously overlooked regions and improve genome annotation. We compared scAPAtrap with two latest methods, scAPA and Sierra, using scRNA-seq data from different experimental technologies and species. Results show that scAPAtrap identified poly(A) sites with higher accuracy and sensitivity than competing methods and could be used to explore APA dynamics among cell types or the heterogeneous APA isoform expression in individual cells. scAPAtrap is available at https://github.com/BMILAB/scAPAtrap.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma , Sinais de Poliadenilação na Ponta 3' do RNA , RNA-Seq , Análise de Célula Única , Software , Anotação de Sequência Molecular
8.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34255024

RESUMO

The dynamic choice of different polyadenylation sites in a gene is referred to as alternative polyadenylation, which functions in many important biological processes. Large-scale messenger RNA 3' end sequencing has revealed that cleavage sites for polyadenylation are presented with microheterogeneity. To date, the conventional determination of polyadenylation site clusters is subjective and arbitrary, leading to inaccurate annotations. Here, we present a weighted density peak clustering method, QuantifyPoly(A), to accurately quantify genome-wide polyadenylation choices. Applying QuantifyPoly(A) on published 3' end sequencing datasets from both animals and plants, their polyadenylation profiles are reshaped into myriads of novel polyadenylation site clusters. Most of these novel polyadenylation site clusters show significantly dynamic usage across different biological samples or associate with binding sites of trans-acting factors. Upstream sequences of these clusters are enriched with polyadenylation signals UGUA, UAAA and/or AAUAAA in a species-dependent manner. Polyadenylation site clusters also exhibit species specificity, while plants ones generally show higher microheterogeneity than that of animals. QuantifyPoly(A) is broadly applicable to any types of 3' end sequencing data and species for accurate quantification and construction of the complex and dynamic polyadenylation landscape and enables us to decode alternative polyadenylation events invisible to conventional methods at a much higher resolution.


Assuntos
Poli A/metabolismo , Animais , Arabidopsis/metabolismo , Oryza/metabolismo , Poliadenilação
9.
Entropy (Basel) ; 25(12)2023 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-38136497

RESUMO

To address the problem that traditional spectral clustering algorithms cannot obtain the complete structural information of networks, this paper proposes a spectral clustering community detection algorithm, PMIK-SC, based on the point-wise mutual information (PMI) graph kernel. The kernel is constructed according to the point-wise mutual information between nodes, which is then used as a proximity matrix to reconstruct the network and obtain the symmetric normalized Laplacian matrix. Finally, the network is partitioned by the eigendecomposition and eigenvector clustering of the Laplacian matrix. In addition, to determine the number of clusters during spectral clustering, this paper proposes a fast algorithm, BI-CNE, for estimating the number of communities. For a specific network, the algorithm first reconstructs the original network and then runs Monte Carlo sampling to estimate the number of communities by Bayesian inference. Experimental results show that the detection speed and accuracy of the algorithm are superior to other existing algorithms for estimating the number of communities. On this basis, the spectral clustering community detection algorithm PMIK-SC also has high accuracy and stability compared with other community detection algorithms and spectral clustering algorithms.

10.
Brief Bioinform ; 21(4): 1261-1276, 2020 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31267126

RESUMO

Alternative polyadenylation (APA) has been implicated to play an important role in post-transcriptional regulation by regulating mRNA abundance, stability, localization and translation, which contributes considerably to transcriptome diversity and gene expression regulation. RNA-seq has become a routine approach for transcriptome profiling, generating unprecedented data that could be used to identify and quantify APA site usage. A number of computational approaches for identifying APA sites and/or dynamic APA events from RNA-seq data have emerged in the literature, which provide valuable yet preliminary results that should be refined to yield credible guidelines for the scientific community. In this review, we provided a comprehensive overview of the status of currently available computational approaches. We also conducted objective benchmarking analysis using RNA-seq data sets from different species (human, mouse and Arabidopsis) and simulated data sets to present a systematic evaluation of 11 representative methods. Our benchmarking study showed that the overall performance of all tools investigated is moderate, reflecting that there is still lot of scope to improve the prediction of APA site or dynamic APA events from RNA-seq data. Particularly, prediction results from individual tools differ considerably, and only a limited number of predicted APA sites or genes are common among different tools. Accordingly, we attempted to give some advice on how to assess the reliability of the obtained results. We also proposed practical recommendations on the appropriate method applicable to diverse scenarios and discussed implications and future directions relevant to profiling APA from RNA-seq data.


Assuntos
Análise de Sequência de RNA/métodos , Animais , Humanos , Poliadenilação
11.
Bioinformatics ; 37(16): 2470-2472, 2021 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-33258917

RESUMO

MOTIVATION: Alternative polyadenylation (APA) has been widely recognized as a widespread mechanism modulated dynamically. Studies based on 3' end sequencing and/or RNA-seq have profiled poly(A) sites in various species with diverse pipelines, yet no unified and easy-to-use toolkit is available for comprehensive APA analyses. RESULTS: We developed an R package called movAPA for modeling and visualization of dynamics of alternative polyadenylation across biological samples. movAPA incorporates rich functions for preprocessing, annotation and statistical analyses of poly(A) sites, identification of poly(A) signals, profiling of APA dynamics and visualization. Particularly, seven metrics are provided for measuring the tissue-specificity or usages of APA sites across samples. Three methods are used for identifying 3' UTR shortening/lengthening events between conditions. APA site switching involving non-3' UTR polyadenylation can also be explored. Using poly(A) site data from rice and mouse sperm cells, we demonstrated the high scalability and flexibility of movAPA in profiling APA dynamics across tissues and single cells. AVAILABILITY AND IMPLEMENTATION: https://github.com/BMILAB/movAPA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Oryza , Poliadenilação , Regiões 3' não Traduzidas , Animais , Camundongos , Oryza/genética , Poli A/metabolismo , RNA-Seq , Software
12.
Bioinformatics ; 36(3): 789-797, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31392316

RESUMO

MOTIVATION: Single-cell RNA-sequencing (scRNA-seq) is fast and becoming a powerful technique for studying dynamic gene regulation at unprecedented resolution. However, scRNA-seq data suffer from problems of extremely high dropout rate and cell-to-cell variability, demanding new methods to recover gene expression loss. Despite the availability of various dropout imputation approaches for scRNA-seq, most studies focus on data with a medium or large number of cells, while few studies have explicitly investigated the differential performance across different sample sizes or the applicability of the approach on small or imbalanced data. It is imperative to develop new imputation approaches with higher generalizability for data with various sample sizes. RESULTS: We proposed a method called scHinter for imputing dropout events for scRNA-seq with special emphasis on data with limited sample size. scHinter incorporates a voting-based ensemble distance and leverages the synthetic minority oversampling technique for random interpolation. A hierarchical framework is also embedded in scHinter to increase the reliability of the imputation for small samples. We demonstrated the ability of scHinter to recover gene expression measurements across a wide spectrum of scRNA-seq datasets with varied sample sizes. We comprehensively examined the impact of sample size and cluster number on imputation. Comprehensive evaluation of scHinter across diverse scRNA-seq datasets with imbalanced or limited sample size showed that scHinter achieved higher and more robust performance than competing approaches, including MAGIC, scImpute, SAVER and netSmooth. AVAILABILITY AND IMPLEMENTATION: Freely available for download at https://github.com/BMILAB/scHinter. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , RNA-Seq , Reprodutibilidade dos Testes , Tamanho da Amostra , Análise de Sequência de RNA , Análise de Célula Única , Software
13.
Plant Physiol ; 182(1): 228-242, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31767692

RESUMO

Alternative cleavage and polyadenylation (APA) is increasingly recognized as an important regulatory mechanism in eukaryotic gene expression and is dynamically modulated in a developmental, tissue-specific, or environmentally responsive manner. Given the functional importance of APA and the rapid accumulation of APA sites in plants, a comprehensive and easily accessible APA site database is necessary for improved understanding of APA-mediated gene expression regulation. We present a database called PlantAPAdb that catalogs the most comprehensive APA site data derived from sequences from diverse 3' sequencing protocols and biological samples in plants. Currently, PlantAPAdb contains APA sites in six species, Oryza sativa (japonica and indica), Arabidopsis (Arabidopsis thaliana), Medicago truncatula, Trifolium pratense, Phyllostachys edulis, and Chlamydomonas reinhardtii APA sites in PlantAPAdb are available for bulk download and can be queried in a Google-like manner. PlantAPAdb provides rich information of the whole-genome APA sites, including genomic locations, heterogeneous cleavage sites, expression levels, and sample information. It also provides comprehensive poly(A) signals for APA sites in different genomic regions according to distinct profiles of cis-elements in plants. In addition, PlantAPAdb contains events of 3' untranslated region shortening/lengthening resulting from APA, which helps to understand the mechanisms underlying systematic changes in 3' untranslated region lengths. Additional information about conservation of APA sites in plants is also available, providing insights into the evolutionary polyadenylation configuration across species. As a user-friendly database, PlantAPAdb is a large and extendable resource for elucidating APA mechanisms, APA conservation, and gene expression regulation.


Assuntos
Poli A/metabolismo , Poliadenilação/fisiologia , Arabidopsis/genética , Arabidopsis/metabolismo , Chlamydomonas reinhardtii/genética , Chlamydomonas reinhardtii/metabolismo , Genoma de Planta/genética , Medicago truncatula/genética , Medicago truncatula/metabolismo , Oryza/genética , Oryza/metabolismo , Poli A/genética , Poliadenilação/genética , Trifolium/genética , Trifolium/metabolismo
14.
Plant J ; 98(2): 260-276, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30570805

RESUMO

Alternative polyadenylation (APA) is a widespread post-transcriptional mechanism that regulates gene expression through mRNA metabolism, playing a pivotal role in modulating phenotypic traits in rice (Oryza sativa L.). However, little is known about the APA-mediated regulation underlying the distinct characteristics between two major rice subspecies, indica and japonica. Using a poly(A)-tag sequencing approach, polyadenylation (poly(A)) site profiles were investigated and compared pairwise from germination to the mature stage between indica and japonica, and extensive differentiation in APA profiles was detected genome-wide. Genes with subspecies-specific poly(A) sites were found to contribute to subspecies characteristics, particularly in disease resistance of indica and cold-stress tolerance of japonica. In most tissues, differential usage of APA sites exhibited an apparent impact on the gene expression profiles between subspecies, and genes with those APA sites were significantly enriched in quantitative trait loci (QTL) related to yield traits, such as spikelet number and 1000-seed weight. In leaves of the booting stage, APA site-switching genes displayed global shortening of 3' untranslated regions with increased expression in indica compared with japonica, and they were overrepresented in the porphyrin and chlorophyll metabolism pathways. This phenomenon may lead to a higher chlorophyll content and photosynthesis in indica than in japonica, being associated with their differential growth rates and yield potentials. We further constructed an online resource for querying and visualizing the poly(A) atlas in these two rice subspecies. Our results suggest that APA may be largely involved in developmental differentiations between two rice subspecies, especially in leaf characteristics and the stress response, broadening our knowledge of the post-transcriptional genetic basis underlying the divergence of rice traits.


Assuntos
Genes de Plantas/genética , Oryza/genética , Oryza/metabolismo , Poliadenilação , Aclimatação , Clorofila/metabolismo , Regulação da Expressão Gênica de Plantas , Germinação , Fenótipo , Fotossíntese , Folhas de Planta/genética , Folhas de Planta/metabolismo , Locos de Características Quantitativas , Sementes , Estresse Fisiológico , Transcriptoma
15.
Plant Cell Physiol ; 61(5): 882-896, 2020 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-32044993

RESUMO

Spartina alterniflora (Spartina) is the only halophyte in the salt marsh. However, the molecular basis of its high salt tolerance remains elusive. In this study, we used Pacific Biosciences (PacBio) full-length single-molecule long-read sequencing and RNA-seq to elucidate the transcriptome dynamics of high salt tolerance in Spartina by salt gradient experiments. High-quality unigenes, transcription factors, non-coding RNA and Spartina-specific transcripts were identified. Co-expression network analysis found that protein kinase-encoding genes (SaOST1, SaCIPK10 and SaLRRs) are hub genes in the salt tolerance regulatory network. High salt stress induced the expression of transcription factors but repressed the expression of long non-coding RNAs. The Spartina transcriptome is closer to rice than Arabidopsis, and a higher proportion of transporter and transcription factor-encoding transcripts have been found in Spartina. Transcriptome analysis showed that high salt stress induced the expression of carbohydrate metabolism, especially cell-wall biosynthesis-related genes in Spartina, and repressed its expression in rice. Compared with rice, high salt stress highly induced the expression of stress response, protein modification and redox-related gene expression and greatly inhibited translation in Spartina. High salt stress also induced alternative splicing in Spartina, while differentially expressed alternative splicing events associated with photosynthesis were overrepresented in Spartina but not in rice. Finally, we built the SAPacBio website for visualizing full-length transcriptome sequences, transcription factors, ncRNAs, salt-tolerant genes and alternative splicing events in Spartina. Overall, this study suggests that the salt tolerance mechanism in Spartina is different from rice in many aspects and is far more complex than expected.


Assuntos
Poaceae/genética , Poaceae/fisiologia , Tolerância ao Sal/genética , Plantas Tolerantes a Sal/genética , Transcriptoma/genética , Processamento Alternativo/genética , Arabidopsis/genética , Regulação da Expressão Gênica de Plantas , Ontologia Genética , Redes Reguladoras de Genes , Genes de Plantas , Oryza/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , RNA não Traduzido/genética , RNA não Traduzido/metabolismo , Estresse Fisiológico/genética , Fatores de Transcrição/metabolismo
16.
Bioinformatics ; 35(15): 2654-2656, 2019 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-30535139

RESUMO

SUMMARY: Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. AVAILABILITY AND IMPLEMENTATION: AStrap is available for download at https://github.com/BMILAB/AStrap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento Alternativo , Genoma , Humanos , Aprendizado de Máquina , Análise de Sequência de RNA , Transcriptoma
17.
BMC Genomics ; 20(1): 75, 2019 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-30669970

RESUMO

BACKGROUND: Alternative polyadenylation (APA) has emerged as a pervasive mechanism that contributes to the transcriptome complexity and dynamics of gene regulation. The current tsunami of whole genome poly(A) site data from various conditions generated by 3' end sequencing provides a valuable data source for the study of APA-related gene expression. Cluster analysis is a powerful technique for investigating the association structure among genes, however, conventional gene clustering methods are not suitable for APA-related data as they fail to consider the information of poly(A) sites (e.g., location, abundance, number, etc.) within each gene or measure the association among poly(A) sites between two genes. RESULTS: Here we proposed a computational framework, named PASCCA, for clustering genes from replicated or unreplicated poly(A) site data using canonical correlation analysis (CCA). PASCCA incorporates multiple layers of gene expression data from both the poly(A) site level and gene level and takes into account the number of replicates and the variability within each experimental group. Moreover, PASCCA characterizes poly(A) sites in various ways including the abundance and relative usage, which can exploit the advantages of 3' end deep sequencing in quantifying APA sites. Using both real and synthetic poly(A) site data sets, the cluster analysis demonstrates that PASCCA outperforms other widely-used distance measures under five performance metrics including connectivity, the Dunn index, average distance, average distance between means, and the biological homogeneity index. We also used PASCCA to infer APA-specific gene modules from recently published poly(A) site data of rice and discovered some distinct functional gene modules. We have made PASCCA an easy-to-use R package for APA-related gene expression analyses, including the characterization of poly(A) sites, quantification of association between genes, and clustering of genes. CONCLUSIONS: By providing a better treatment of the noise inherent in repeated measurements and taking into account multiple layers of poly(A) site data, PASCCA could be a general tool for clustering and analyzing APA-specific gene expression data. PASCCA could be used to elucidate the dynamic interplay of genes and their APA sites among various biological conditions from emerging 3' end sequencing data to address the complex biological phenomenon.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Poliadenilação , Software , Análise por Conglomerados , Biologia Computacional/métodos , Correlação de Dados , Expressão Gênica , Oryza/genética
18.
BMC Genomics ; 20(1): 347, 2019 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-31068142

RESUMO

BACKGROUND: Single-cell RNA-sequencing (scRNA-seq) is fast becoming a powerful tool for profiling genome-scale transcriptomes of individual cells and capturing transcriptome-wide cell-to-cell variability. However, scRNA-seq technologies suffer from high levels of technical noise and variability, hindering reliable quantification of lowly and moderately expressed genes. Since most downstream analyses on scRNA-seq, such as cell type clustering and differential expression analysis, rely on the gene-cell expression matrix, preprocessing of scRNA-seq data is a critical preliminary step in the analysis of scRNA-seq data. RESULTS: We presented scNPF, an integrative scRNA-seq preprocessing framework assisted by network propagation and network fusion, for recovering gene expression loss, correcting gene expression measurements, and learning similarities between cells. scNPF leverages the context-specific topology inherent in the given data and the priori knowledge derived from publicly available molecular gene-gene interaction networks to augment gene-gene relationships in a data driven manner. We have demonstrated the great potential of scNPF in scRNA-seq preprocessing for accurately recovering gene expression values and learning cell similarity networks. Comprehensive evaluation of scNPF across a wide spectrum of scRNA-seq data sets showed that scNPF achieved comparable or higher performance than the competing approaches according to various metrics of internal validation and clustering accuracy. We have made scNPF an easy-to-use R package, which can be used as a versatile preprocessing plug-in for most existing scRNA-seq analysis pipelines or tools. CONCLUSIONS: scNPF is a universal tool for preprocessing of scRNA-seq data, which jointly incorporates the global topology of priori interaction networks and the context-specific information encapsulated in the scRNA-seq data to capture both shared and complementary knowledge from diverse data sources. scNPF could be used to recover gene signatures and learn cell-to-cell similarities from emerging scRNA-seq data to facilitate downstream analyses such as dimension reduction, cell type clustering, and visualization.


Assuntos
Regulação da Expressão Gênica , Redes Reguladoras de Genes , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Célula Única/métodos , Software , Transcriptoma , Algoritmos , Perfilação da Expressão Gênica , Humanos
19.
Bioinformatics ; 34(12): 2123-2125, 2018 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-29385403

RESUMO

Summary: Alternative polyadenylation (APA) is now emerging as a widespread mechanism modulated tissue-specifically, which highlights the need to define tissue-specific poly(A) sites for profiling APA dynamics across tissues. We have developed an R package called TSAPA based on the machine learning model for identifying tissue-specific poly(A) sites in plants. A feature space including more than 200 features was assembled to specifically characterize poly(A) sites in plants. The classification model in TSAPA can be customized by selecting desirable features or classifiers. TSAPA is also capable of predicting tissue-specific poly(A) sites in unannotated intergenic regions. TSAPA will be a valuable addition to the community for studying dynamics of APA in plants. Availability and implementation: https://github.com/BMILAB/TSAPA. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Plantas/metabolismo , Poliadenilação , Software , Aprendizado de Máquina , Poli A , Análise de Sequência de DNA , Análise de Sequência de RNA
20.
Opt Express ; 27(6): 8566-8577, 2019 Mar 18.
Artigo em Inglês | MEDLINE | ID: mdl-31052671

RESUMO

Division of focal plane (DoFP) polarimeter is widely used in polarization imaging sensors. The periodically arranged micro-polarizers integrated on the focal plane ensure its outstanding real-time performance, but reduce the spatial resolution of output images and further affect the calculation of polarization parameters. In this paper, a four-layer, end-to-end fully convolutional neural network called Fork-Net is proposed, which aims to directly improve the imaging quality of three polarization properties: intensity (i.e., S0), degree of linear polarization (DoLP), and angle of polarization (AoP), rather than focusing on reducing the interpolation error of intensity images of different polarization orientations. The Fork-Net accepts raw mosaic images as input and directly outputs S0, DoLP, and AoP. It is also trained with a customized loss function. The experimental results show that compared with existing methods, the proposed one achieves the highest peak signal-to-noise ratio (PSNR) and prominent visual quality on output images.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa