Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 79
Filtrar
1.
Cell ; 187(9): 2336-2341.e5, 2024 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-38582080

RESUMO

The Genome Aggregation Database (gnomAD), widely recognized as the gold-standard reference map of human genetic variation, has largely overlooked tandem repeat (TR) expansions, despite the fact that TRs constitute ∼6% of our genome and are linked to over 50 human diseases. Here, we introduce the TR-gnomAD (https://wlcb.oit.uci.edu/TRgnomAD), a biobank-scale reference of 0.86 million TRs derived from 338,963 whole-genome sequencing (WGS) samples of diverse ancestries (39.5% non-European samples). TR-gnomAD offers critical insights into ancestry-specific disease prevalence using disparities in TR unit number frequencies among ancestries. Moreover, TR-gnomAD is able to differentiate between common, presumably benign TR expansions, which are prevalent in TR-gnomAD, from those potentially pathogenic TR expansions, which are found more frequently in disease groups than within TR-gnomAD. Together, TR-gnomAD is an invaluable resource for researchers and physicians to interpret TR expansions in individuals with genetic diseases.


Assuntos
Genoma Humano , Sequências de Repetição em Tandem , Humanos , Sequências de Repetição em Tandem/genética , Sequenciamento Completo do Genoma , Bases de Dados Genéticas , Expansão das Repetições de DNA/genética , Estudo de Associação Genômica Ampla
2.
Cell ; 173(4): 1014-1030.e17, 2018 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-29727661

RESUMO

Tools to understand how the spliceosome functions in vivo have lagged behind advances in the structural biology of the spliceosome. Here, methods are described to globally profile spliceosome-bound pre-mRNA, intermediates, and spliced mRNA at nucleotide resolution. These tools are applied to three yeast species that span 600 million years of evolution. The sensitivity of the approach enables the detection of canonical and non-canonical events, including interrupted, recursive, and nested splicing. This application of statistical modeling uncovers independent roles for the size and position of the intron and the number of introns per transcript in substrate progression through the two catalytic stages. These include species-specific inputs suggestive of spliceosome-transcriptome coevolution. Further investigations reveal the ATP-dependent discard of numerous endogenous substrates after spliceosome assembly in vivo and connect this discard to intron retention, a form of splicing regulation. Spliceosome profiling is a quantitative, generalizable global technology used to investigate an RNP central to eukaryotic gene expression.


Assuntos
Ribonucleoproteínas Nucleares Pequenas/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Spliceossomos/metabolismo , Trifosfato de Adenosina/metabolismo , Teorema de Bayes , RNA Helicases DEAD-box/genética , RNA Helicases DEAD-box/metabolismo , Imunoprecipitação , Precursores de RNA/metabolismo , Splicing de RNA , Fatores de Processamento de RNA/genética , Fatores de Processamento de RNA/metabolismo , RNA Fúngico/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Telomerase/genética , Telomerase/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
3.
Proc Natl Acad Sci U S A ; 120(6): e2202584120, 2023 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-36730203

RESUMO

Model organisms are instrumental substitutes for human studies to expedite basic, translational, and clinical research. Despite their indispensable role in mechanistic investigation and drug development, molecular congruence of animal models to humans has long been questioned and debated. Little effort has been made for an objective quantification and mechanistic exploration of a model organism's resemblance to humans in terms of molecular response under disease or drug treatment. We hereby propose a framework, namely Congruence Analysis for Model Organisms (CAMO), for transcriptomic response analysis by developing threshold-free differential expression analysis, quantitative concordance/discordance scores incorporating data variabilities, pathway-centric downstream investigation, knowledge retrieval by text mining, and topological gene module detection for hypothesis generation. Instead of a genome-wide vague and dichotomous answer of "poorly" or "greatly" mimicking humans, CAMO assists researchers to numerically quantify congruence, to dissect true cross-species differences from unwanted biological or cohort variabilities, and to visually identify molecular mechanisms and pathway subnetworks that are best or least mimicked by model organisms, which altogether provides foundations for hypothesis generation and subsequent translational decisions.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Animais , Humanos , Genoma , Proteômica , Modelos Animais
4.
RNA ; 28(6): 808-831, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35273099

RESUMO

Neurons provide a rich setting for studying post-transcriptional control. Here, we investigate the landscape of translational control in neurons and search for mRNA features that explain differences in translational efficiency (TE), considering the interplay between TE, mRNA poly(A)-tail lengths, microRNAs, and neuronal activation. In neurons and brain tissues, TE correlates with tail length, and a few dozen mRNAs appear to undergo cytoplasmic polyadenylation upon light or chemical stimulation. However, the correlation between TE and tail length is modest, explaining <5% of TE variance, and even this modest relationship diminishes when accounting for other mRNA features. Thus, tail length appears to affect TE only minimally. Accordingly, miRNAs, which accelerate deadenylation of their mRNA targets, primarily influence target mRNA levels, with no detectable effect on either steady-state tail lengths or TE. Larger correlates with TE include codon composition and predicted mRNA folding energy. When combined in a model, the identified correlates explain 38%-45% of TE variance. These results provide a framework for considering the relative impact of factors that contribute to translational control in neurons. They indicate that when examined in bulk, translational control in neurons largely resembles that of other types of post-embryonic cells. Thus, detection of more specialized control might require analyses that can distinguish translation occurring in neuronal processes from that occurring in cell bodies.


Assuntos
MicroRNAs , Regulação da Expressão Gênica , MicroRNAs/genética , MicroRNAs/metabolismo , Neurônios/metabolismo , Poli A/genética , Poli A/metabolismo , Poliadenilação , Biossíntese de Proteínas , RNA Mensageiro/metabolismo
5.
Bioinformatics ; 38(11): 3126-3127, 2022 05 26.
Artigo em Inglês | MEDLINE | ID: mdl-35426898

RESUMO

SUMMARY: The number of cells measured in single-cell transcriptomic data has grown fast in recent years. For such large-scale data, subsampling is a powerful and often necessary tool for exploratory data analysis. However, the easiest random subsampling is not ideal from the perspective of preserving rare cell types. Therefore, diversity-preserving subsampling is required for fast exploration of cell types in a large-scale dataset. Here, we propose scSampler, an algorithm for fast diversity-preserving subsampling of single-cell transcriptomic data. AVAILABILITY AND IMPLEMENTATION: scSampler is implemented in Python and is published under the MIT source license. It can be installed by "pip install scsampler" and used with the Scanpy pipline. The code is available on GitHub: https://github.com/SONGDONGYUAN1994/scsampler. An R interface is available at: https://github.com/SONGDONGYUAN1994/rscsampler. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Transcriptoma , Algoritmos , Análise de Dados
6.
Bioinformatics ; 38(16): 3927-3934, 2022 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-35758616

RESUMO

MOTIVATION: Modeling single-cell gene expression trends along cell pseudotime is a crucial analysis for exploring biological processes. Most existing methods rely on nonparametric regression models for their flexibility; however, nonparametric models often provide trends too complex to interpret. Other existing methods use interpretable but restrictive models. Since model interpretability and flexibility are both indispensable for understanding biological processes, the single-cell field needs a model that improves the interpretability and largely maintains the flexibility of nonparametric regression models. RESULTS: Here, we propose the single-cell generalized trend model (scGTM) for capturing a gene's expression trend, which may be monotone, hill-shaped or valley-shaped, along cell pseudotime. The scGTM has three advantages: (i) it can capture non-monotonic trends that are easy to interpret, (ii) its parameters are biologically interpretable and trend informative, and (iii) it can flexibly accommodate common distributions for modeling gene expression counts. To tackle the complex optimization problems, we use the particle swarm optimization algorithm to find the constrained maximum likelihood estimates for the scGTM parameters. As an application, we analyze several single-cell gene expression datasets using the scGTM and show that scGTM can capture interpretable gene expression trends along cell pseudotime and reveal molecular insights underlying biological processes. AVAILABILITY AND IMPLEMENTATION: The Python package scGTM is open-access and available at https://github.com/ElvisCuiHan/scGTM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Análise de Célula Única , Software , Análise de Célula Única/métodos , Algoritmos , Funções Verossimilhança , Expressão Gênica
7.
Genome Res ; 29(12): 2056-2072, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31694868

RESUMO

Genome-wide accurate identification and quantification of full-length mRNA isoforms is crucial for investigating transcriptional and posttranscriptional regulatory mechanisms of biological phenomena. Despite continuing efforts in developing effective computational tools to identify or assemble full-length mRNA isoforms from second-generation RNA-seq data, it remains a challenge to accurately identify mRNA isoforms from short sequence reads owing to the substantial information loss in RNA-seq experiments. Here, we introduce a novel statistical method, annotation-assisted isoform discovery (AIDE), the first approach that directly controls false isoform discoveries by implementing the testing-based model selection principle. Solving the isoform discovery problem in a stepwise and conservative manner, AIDE prioritizes the annotated isoforms and precisely identifies novel isoforms whose addition significantly improves the explanation of observed RNA-seq reads. We evaluate the performance of AIDE based on multiple simulated and real RNA-seq data sets followed by PCR-Sanger sequencing validation. Our results show that AIDE effectively leverages the annotation information to compensate the information loss owing to short read lengths. AIDE achieves the highest precision in isoform discovery and the lowest error rates in isoform abundance estimation, compared with three state-of-the-art methods Cufflinks, SLIDE, and StringTie. As a robust bioinformatics tool for transcriptome analysis, AIDE enables researchers to discover novel transcripts with high confidence.


Assuntos
Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Anotação de Sequência Molecular , Isoformas de RNA , RNA Mensageiro , Análise de Sequência de RNA , Humanos , Isoformas de RNA/biossíntese , Isoformas de RNA/genética , RNA Mensageiro/biossíntese , RNA Mensageiro/genética
8.
Bioinformatics ; 37(9): 1225-1233, 2021 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-32814973

RESUMO

MOTIVATION: Gene clustering is a widely used technique that has enabled computational prediction of unknown gene functions within a species. However, it remains a challenge to refine gene function prediction by leveraging evolutionarily conserved genes in another species. This challenge calls for a new computational algorithm to identify gene co-clusters in two species, so that genes in each co-cluster exhibit similar expression levels in each species and strong conservation between the species. RESULTS: Here, we develop the bipartite tight spectral clustering (BiTSC) algorithm, which identifies gene co-clusters in two species based on gene orthology information and gene expression data. BiTSC novelly implements a formulation that encodes gene orthology as a bipartite network and gene expression data as node covariates. This formulation allows BiTSC to adopt and combine the advantages of multiple unsupervised learning techniques: kernel enhancement, bipartite spectral clustering, consensus clustering, tight clustering and hierarchical clustering. As a result, BiTSC is a flexible and robust algorithm capable of identifying informative gene co-clusters without forcing all genes into co-clusters. Another advantage of BiTSC is that it does not rely on any distributional assumptions. Beyond cross-species gene co-clustering, BiTSC also has wide applications as a general algorithm for identifying tight node co-clusters in any bipartite network with node covariates. We demonstrate the accuracy and robustness of BiTSC through comprehensive simulation studies. In a real data example, we use BiTSC to identify conserved gene co-clusters of Drosophila melanogaster and Caenorhabditis elegans, and we perform a series of downstream analysis to both validate BiTSC and verify the biological significance of the identified co-clusters. AVAILABILITY AND IMPLEMENTATION: The Python package BiTSC is open-access and available at https://github.com/edensunyidan/BiTSC. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Drosophila melanogaster , Perfilação da Expressão Gênica , Algoritmos , Animais , Análise por Conglomerados , Expressão Gênica
9.
Bioinformatics ; 37(Suppl_1): i358-i366, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252925

RESUMO

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) captures whole transcriptome information of individual cells. While scRNA-seq measures thousands of genes, researchers are often interested in only dozens to hundreds of genes for a closer study. Then, a question is how to select those informative genes from scRNA-seq data. Moreover, single-cell targeted gene profiling technologies are gaining popularity for their low costs, high sensitivity and extra (e.g. spatial) information; however, they typically can only measure up to a few hundred genes. Then another challenging question is how to select genes for targeted gene profiling based on existing scRNA-seq data. RESULTS: Here, we develop the single-cell Projective Non-negative Matrix Factorization (scPNMF) method to select informative genes from scRNA-seq data in an unsupervised way. Compared with existing gene selection methods, scPNMF has two advantages. First, its selected informative genes can better distinguish cell types. Second, it enables the alignment of new targeted gene profiling data with reference data in a low-dimensional space to facilitate the prediction of cell types in the new data. Technically, scPNMF modifies the PNMF algorithm for gene selection by changing the initialization and adding a basis selection step, which selects informative bases to distinguish cell types. We demonstrate that scPNMF outperforms the state-of-the-art gene selection methods on diverse scRNA-seq datasets. Moreover, we show that scPNMF can guide the design of targeted gene profiling experiments and the cell-type annotation on targeted gene profiling data. AVAILABILITY AND IMPLEMENTATION: The R package is open-access and available at https://github.com/JSB-UCLA/scPNMF. The data used in this work are available at Zenodo: https://doi.org/10.5281/zenodo.4797997. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Algoritmos , Análise de Sequência de RNA , Software
10.
Bioinformatics ; 37(17): 2741-2743, 2021 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-33532827

RESUMO

SUMMARY: With the advance of genomic sequencing techniques, chromatin accessible regions, transcription factor binding sites and epigenetic modifications can be identified at genome-wide scale. Conventional analyses focus on the gene regulation at proximal regions; however, distal regions are usually less focused, largely due to the lack of reliable tools to link these regions to coding genes. In this study, we introduce RAD (Region Associated Differentially expressed genes), a user-friendly web tool to identify both proximal and distal region associated differentially expressed genes (DEGs). With DEGs and genomic regions of interest (gROI) as input, RAD maps the up- and down-regulated genes associated with any gROI and helps researchers to infer the regulatory function of these regions based on the distance of gROI to differentially expressed genes. RAD includes visualization of the results and statistical inference for significance. AVAILABILITY AND IMPLEMENTATION: RAD is implemented with Python 3.7 and run on a Nginx server. RAD is freely available at https://labw.org/rad as online web service. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.
PLoS Comput Biol ; 17(6): e1009095, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34166361

RESUMO

The effectiveness of immune responses depends on the precision of stimulus-responsive gene expression programs. Cells specify which genes to express by activating stimulus-specific combinations of stimulus-induced transcription factors (TFs). Their activities are decoded by a gene regulatory strategy (GRS) associated with each response gene. Here, we examined whether the GRSs of target genes may be inferred from stimulus-response (input-output) datasets, which remains an unresolved model-identifiability challenge. We developed a mechanistic modeling framework and computational workflow to determine the identifiability of all possible combinations of synergistic (AND) or non-synergistic (OR) GRSs involving three transcription factors. Considering different sets of perturbations for stimulus-response studies, we found that two thirds of GRSs are easily distinguishable but that substantially more quantitative data is required to distinguish the remaining third. To enhance the accuracy of the inference with timecourse experimental data, we developed an advanced error model that avoids error overestimates by distinguishing between value and temporal error. Incorporating this error model into a Bayesian framework, we show that GRS models can be identified for individual genes by considering multiple datasets. Our analysis rationalizes the allocation of experimental resources by identifying most informative TF stimulation conditions. Applying this computational workflow to experimental data of immune response genes in macrophages, we found that a much greater fraction of genes are combinatorially controlled than previously reported by considering compensation among transcription factors. Specifically, we revealed that a group of known NFκB target genes may also be regulated by IRF3, which is supported by chromatin immuno-precipitation analysis. Our study provides a computational workflow for designing and interpreting stimulus-response gene expression studies to identify underlying gene regulatory strategies and further a mechanistic understanding.


Assuntos
Redes Reguladoras de Genes , Modelos Biológicos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Animais , Teorema de Bayes , Células Cultivadas , Sequenciamento de Cromatina por Imunoprecipitação , Biologia Computacional , Simulação por Computador , Perfilação da Expressão Gênica , Imunidade/genética , Funções Verossimilhança , Macrófagos/metabolismo , Camundongos , Modelos Genéticos , RNA-Seq
12.
Stat Sci ; 36(1): 89-108, 2021 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34305304

RESUMO

The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.

13.
Nucleic Acids Res ; 47(13): e77, 2019 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-31045217

RESUMO

The availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the efficacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign is able to extract recurrent chromatin state patterns along a single epigenome, and many of these patterns carry cell-type-specific characteristics. EpiAlign can also detect common chromatin state patterns across multiple epigenomes, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns.


Assuntos
Cromatina/ultraestrutura , Biologia Computacional/métodos , Epigenômica/métodos , Alinhamento de Sequência , Algoritmos , Sequência de Bases , Química Encefálica , Cromatina/genética , Metilação de DNA , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Ontologia Genética , Humanos , Proteínas do Tecido Nervoso/biossíntese , Proteínas do Tecido Nervoso/química , Proteínas do Tecido Nervoso/genética , Software
14.
Proc Natl Acad Sci U S A ; 115(5): E1069-E1074, 2018 01 30.
Artigo em Inglês | MEDLINE | ID: mdl-29339507

RESUMO

Genome-wide characterization by next-generation sequencing has greatly improved our understanding of the landscape of epigenetic modifications. Since 2008, whole-genome bisulfite sequencing (WGBS) has become the gold standard for DNA methylation analysis, and a tremendous amount of WGBS data has been generated by the research community. However, the systematic comparison of DNA methylation profiles to identify regulatory mechanisms has yet to be fully explored. Here we reprocessed the raw data of over 500 publicly available Arabidopsis WGBS libraries from various mutant backgrounds, tissue types, and stress treatments and also filtered them based on sequencing depth and efficiency of bisulfite conversion. This enabled us to identify high-confidence differentially methylated regions (hcDMRs) by comparing each test library to over 50 high-quality wild-type controls. We developed statistical and quantitative measurements to analyze the overlapping of DMRs and to cluster libraries based on their effect on DNA methylation. In addition to confirming existing relationships, we revealed unanticipated connections between well-known genes. For instance, MET1 and CMT3 were found to be required for the maintenance of asymmetric CHH methylation at nonoverlapping regions of CMT2 targeted heterochromatin. Our comparative methylome approach has established a framework for extracting biological insights via large-scale comparison of methylomes and can also be adopted for other genomics datasets.


Assuntos
Arabidopsis/genética , Metilação de DNA , Epigenômica , Regulação da Expressão Gênica de Plantas , Análise por Conglomerados , Biologia Computacional , Ilhas de CpG , Epigênese Genética , Biblioteca Gênica , Genoma de Planta , Heterocromatina/química , Sequenciamento de Nucleotídeos em Larga Escala , Plantas Geneticamente Modificadas , Análise de Sequência de DNA , Análise de Sequência de RNA , Software
15.
Bioinformatics ; 35(14): i41-i50, 2019 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-31510652

RESUMO

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) has revolutionized biological sciences by revealing genome-wide gene expression levels within individual cells. However, a critical challenge faced by researchers is how to optimize the choices of sequencing platforms, sequencing depths and cell numbers in designing scRNA-seq experiments, so as to balance the exploration of the depth and breadth of transcriptome information. RESULTS: Here we present a flexible and robust simulator, scDesign, the first statistical framework for researchers to quantitatively assess practical scRNA-seq experimental design in the context of differential gene expression analysis. In addition to experimental design, scDesign also assists computational method development by generating high-quality synthetic scRNA-seq datasets under customized experimental settings. In an evaluation based on 17 cell types and 6 different protocols, scDesign outperformed four state-of-the-art scRNA-seq simulation methods and led to rational experimental design. In addition, scDesign demonstrates reproducibility across biological replicates and independent studies. We also discuss the performance of multiple differential expression and dimension reduction methods based on the protocol-dependent scRNA-seq data generated by scDesign. scDesign is expected to be an effective bioinformatic tool that assists rational scRNA-seq experimental design and comparison of scRNA-seq computational methods based on specific research goals. AVAILABILITY AND IMPLEMENTATION: We have implemented our method in the R package scDesign, which is freely available at https://github.com/Vivianstats/scDesign. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , RNA Citoplasmático Pequeno , Análise de Célula Única , Reprodutibilidade dos Testes , Projetos de Pesquisa , Análise de Sequência de RNA , Software
16.
Nature ; 512(7515): 445-8, 2014 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-25164755

RESUMO

The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.


Assuntos
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Transcriptoma/genética , Animais , Caenorhabditis elegans/embriologia , Caenorhabditis elegans/crescimento & desenvolvimento , Cromatina/genética , Análise por Conglomerados , Drosophila melanogaster/crescimento & desenvolvimento , Regulação da Expressão Gênica no Desenvolvimento/genética , Histonas/metabolismo , Humanos , Larva/genética , Larva/crescimento & desenvolvimento , Modelos Genéticos , Anotação de Sequência Molecular , Regiões Promotoras Genéticas/genética , Pupa/genética , Pupa/crescimento & desenvolvimento , RNA não Traduzido/genética , Análise de Sequência de RNA
17.
Nature ; 512(7515): 453-6, 2014 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-25164757

RESUMO

Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.


Assuntos
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Evolução Molecular , Regulação da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , Caenorhabditis elegans/crescimento & desenvolvimento , Imunoprecipitação da Cromatina , Sequência Conservada/genética , Drosophila melanogaster/crescimento & desenvolvimento , Regulação da Expressão Gênica no Desenvolvimento/genética , Genoma/genética , Humanos , Anotação de Sequência Molecular , Motivos de Nucleotídeos/genética , Especificidade de Órgãos/genética , Fatores de Transcrição/genética
18.
Nucleic Acids Res ; 45(20): 11821-11836, 2017 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-29040683

RESUMO

Translation rate per mRNA molecule correlates positively with mRNA abundance. As a result, protein levels do not scale linearly with mRNA levels, but instead scale with the abundance of mRNA raised to the power of an 'amplification exponent'. Here we show that to quantitate translational control, the translation rate must be decomposed into two components. One, TRmD, depends on the mRNA level and defines the amplification exponent. The other, TRmIND, is independent of mRNA amount and impacts the correlation coefficient between protein and mRNA levels. We show that in Saccharomyces cerevisiae TRmD represents ∼20% of the variance in translation and directs an amplification exponent of 1.20 with a 95% confidence interval [1.14, 1.26]. TRmIND constitutes the remaining ∼80% of the variance in translation and explains ∼5% of the variance in protein expression. We also find that TRmD and TRmIND are preferentially determined by different mRNA sequence features: TRmIND by the length of the open reading frame and TRmD both by a ∼60 nucleotide element that spans the initiating AUG and by codon and amino acid frequency. Our work provides more appropriate estimates of translational control and implies that TRmIND is under different evolutionary selective pressures than TRmD.


Assuntos
Regulação Fúngica da Expressão Gênica , Biossíntese de Proteínas/genética , RNA Mensageiro/genética , Saccharomyces cerevisiae/genética , Algoritmos , Sequência de Bases , Códon/genética , Códon de Iniciação/genética , Modelos Genéticos , Fases de Leitura Aberta/genética , Iniciação Traducional da Cadeia Peptídica/genética , RNA Mensageiro/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo
19.
Nucleic Acids Res ; 45(4): 1657-1672, 2017 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-27980097

RESUMO

Distinguishing cell states based only on gene expression data remains a challenging task. This is true even for analyses within a species. In cross-species comparisons, the results obtained by different groups have varied widely. Here, we integrate RNA-seq data from more than 40 cell and tissue types of four mammalian species to identify sets of associated genes as indicators for specific cell states in each species. We employ a statistical method, TROM, to identify both protein-coding and non-coding indicators. Next, we map the cell states within each species and also between species using these indicator genes. We recapitulate known phenotypic similarity between related cell and tissue types and reveal molecular basis for their similarity. We also report novel associations between several tissues and cell types with functional support. Moreover, our identified conserved associated genes are found to be a good resource for studying cell differentiation and reprogramming. Lastly, long non-coding RNAs can serve well as associated genes to indicate cell states. We further infer the biological functions of those non-coding associated genes based on their co-expressed protein-coding genes. This study demonstrates that combining statistical modeling with public RNA-seq data can be powerful for improving our understanding of cell identity control.


Assuntos
Mapeamento de Sequências Contíguas , Evolução Molecular , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Mamíferos/genética , Transcriptoma , Algoritmos , Animais , Análise por Conglomerados , Biologia Computacional/métodos , Regulação da Expressão Gênica no Desenvolvimento , Ontologia Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos , Anotação de Sequência Molecular , Família Multigênica , Especificidade de Órgãos
20.
BMC Genomics ; 18(1): 234, 2017 03 16.
Artigo em Inglês | MEDLINE | ID: mdl-28302059

RESUMO

BACKGROUND: We report a statistical study to find correspondence of D. melanogaster and C. elegans developmental stages based on alternative splicing (AS) characteristics of conserved cassette exons using modENCODE RNA-seq data. We identify "stage-associated exons" to capture the AS characteristics of each stage and use these exons to map pairwise stages within and between the two species by an overlap test. RESULTS: Within fly and worm, adjacent developmental stages are mapped to each other, i.e., a strong diagonal pattern is observed as expected, supporting the validity of our approach. Between fly and worm, two parallel mapping patterns are observed between fly early embryos to early larvae and worm life cycle, and between fly late larvae to adults and worm late embryos to adults. We also apply this approach to compare tissues and cells from fly and worm. Findings include the high similarity between fly/worm adults and fly/worm embryos, groupings of fly cell lines, and strong mappings of fly head tissues to worm late embryos and male adults. Gene ontology and KEGG enrichment analyses provide a detailed functional annotation of the identified stage-associated exons, as well as a functional explanation of the observed correspondence map between fly and worm developmental stages. CONCLUSIONS: Our results suggest that AS dynamics of the exon pairs that share similar DNA sequences are informative for finding transcriptomic similarity of biological samples. Our study is innovative in two aspects. First, to our knowledge, our study is the first comprehensive study of AS events in fly and worm developmental stages, tissues, and cells. AS events provide an alternative perspective of transcriptome dynamics, compared to gene expression events. Second, our results do not entirely rely on the information of orthologous genes. Interesting results are also observed for fly and worm cassette exon pairs with DNA sequence similarity but not in orthologous gene pairs.


Assuntos
Processamento Alternativo , Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Éxons , Regulação da Expressão Gênica no Desenvolvimento , Animais , Caenorhabditis elegans/crescimento & desenvolvimento , Análise por Conglomerados , Biologia Computacional/métodos , Drosophila melanogaster/crescimento & desenvolvimento , Evolução Molecular , Perfilação da Expressão Gênica , Ontologia Genética , Genoma , Genômica/métodos , Estágios do Ciclo de Vida/genética , Anotação de Sequência Molecular , Especificidade de Órgãos/genética , Transcriptoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA