RESUMO
Neuropsychiatric disorders classically lack defining brain pathologies, but recent work has demonstrated dysregulation at the molecular level, characterized by transcriptomic and epigenetic alterations1-3. In autism spectrum disorder (ASD), this molecular pathology involves the upregulation of microglial, astrocyte and neural-immune genes, the downregulation of synaptic genes, and attenuation of gene-expression gradients in cortex1,2,4-6. However, whether these changes are limited to cortical association regions or are more widespread remains unknown. To address this issue, we performed RNA-sequencing analysis of 725 brain samples spanning 11 cortical areas from 112 post-mortem samples from individuals with ASD and neurotypical controls. We find widespread transcriptomic changes across the cortex in ASD, exhibiting an anterior-to-posterior gradient, with the greatest differences in primary visual cortex, coincident with an attenuation of the typical transcriptomic differences between cortical regions. Single-nucleus RNA-sequencing and methylation profiling demonstrate that this robust molecular signature reflects changes in cell-type-specific gene expression, particularly affecting excitatory neurons and glia. Both rare and common ASD-associated genetic variation converge within a downregulated co-expression module involving synaptic signalling, and common variation alone is enriched within a module of upregulated protein chaperone genes. These results highlight widespread molecular changes across the cerebral cortex in ASD, extending beyond association cortex to broadly involve primary sensory regions.
Assuntos
Transtorno do Espectro Autista , Córtex Cerebral , Variação Genética , Transcriptoma , Humanos , Transtorno do Espectro Autista/genética , Transtorno do Espectro Autista/metabolismo , Transtorno do Espectro Autista/patologia , Córtex Cerebral/metabolismo , Córtex Cerebral/patologia , Neurônios/metabolismo , RNA/análise , RNA/genética , Transcriptoma/genética , Autopsia , Análise de Sequência de RNA , Córtex Visual Primário/metabolismo , Neuroglia/metabolismoRESUMO
Neuroinflammation and immune dysregulation play a key role in Alzheimer's disease (AD) and are also associated with severe Covid-19 and neurological symptoms. Also, genome-wide association studies found many risk single nucleotide polymorphisms (SNPs) for AD and Covid-19. However, our understanding of underlying gene regulatory mechanisms from risk SNPs to AD, Covid-19 and phenotypes is still limited. To this end, we performed an integrative multi-omics analysis to predict gene regulatory networks for major brain regions from population data in AD. Our networks linked transcription factors (TFs) to TF binding sites (TFBSs) on regulatory elements to target genes. Comparative network analyses revealed cross-region-conserved and region-specific regulatory networks, in which many immunological genes are present. Furthermore, we identified a list of AD-Covid genes using our networks involving known and Covid-19 genes. Our machine learning analysis prioritized 36 AD-Covid candidate genes for predicting Covid severity. Our independent validation analyses found that these genes outperform known genes for classifying Covid-19 severity and AD. Finally, we mapped genome-wide association study SNPs of AD and severe Covid that interrupt TFBSs on our regulatory networks, revealing potential mechanistic insights of those disease risk variants. Our analyses and results are open-source available, providing an AD-Covid functional genomic resource at the brain region level.
Assuntos
Doença de Alzheimer , COVID-19 , Humanos , Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Multiômica , COVID-19/genética , Encéfalo/metabolismo , FenótipoRESUMO
Alzheimer's disease (AD) has a strong genetic predisposition. However, its risk genes remain incompletely identified. We developed an Alzheimer's brain gene network-based approach to predict AD-associated genes by leveraging the functional pattern of known AD-associated genes. Our constructed network outperformed existing networks in predicting AD genes. We then systematically validated the predictions using independent genetic, transcriptomic, proteomic data, neuropathological and clinical data. First, top-ranked genes were enriched in AD-associated pathways. Second, using external gene expression data from the Mount Sinai Brain Bank study, we found that the top-ranked genes were significantly associated with neuropathological and clinical traits, including the Consortium to Establish a Registry for Alzheimer's Disease score, Braak stage score and clinical dementia rating. The analysis of Alzheimer's brain single-cell RNA-seq data revealed cell-type-specific association of predicted genes with early pathology of AD. Third, by interrogating proteomic data in the Religious Orders Study and Memory and Aging Project and Baltimore Longitudinal Study of Aging studies, we observed a significant association of protein expression level with cognitive function and AD clinical severity. The network, method and predictions could become a valuable resource to advance the identification of risk genes for AD.
Assuntos
Doença de Alzheimer/genética , Doença de Alzheimer/metabolismo , Encéfalo/metabolismo , Redes Reguladoras de Genes , Predisposição Genética para Doença , Envelhecimento/genética , Perfilação da Expressão Gênica , Humanos , Estudos Longitudinais , Memória , Proteômica , RNA-Seq , TranscriptomaRESUMO
Recent advances in consortium-scale genome-wide association studies (GWAS) have highlighted the involvement of common genetic variants in autism spectrum disorder (ASD), but our understanding of their etiologic roles, especially the interplay with rare variants, is incomplete. In this work, we introduce an analytical framework to quantify the transmission disequilibrium of genetically regulated gene expression from parents to offspring. We applied this framework to conduct a transcriptome-wide association study (TWAS) on 7,805 ASD proband-parent trios, and replicated our findings using 35,740 independent samples. We identified 31 associations at the transcriptome-wide significance level. In particular, we identified POU3F2 (p = 2.1E-7), a transcription factor mainly expressed in developmental brain. Gene targets regulated by POU3F2 showed a 2.7-fold enrichment for known ASD genes (p = 2.0E-5) and a 2.7-fold enrichment for loss-of-function de novo mutations in ASD probands (p = 7.1E-5). These results provide a novel connection between rare and common variants, whereby ASD genes affected by very rare mutations are regulated by an unlinked transcription factor affected by common genetic variations.
Assuntos
Transtorno do Espectro Autista/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Hipocampo/metabolismo , Proteínas de Homeodomínio/genética , Fatores do Domínio POU/genética , Transcriptoma/genética , Alelos , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Humanos , Mutação , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Fatores de Risco , Análise Espaço-TemporalRESUMO
Dysregulation of gene expression in Alzheimer's disease (AD) remains elusive, especially at the cell type level. Gene regulatory network, a key molecular mechanism linking transcription factors (TFs) and regulatory elements to govern gene expression, can change across cell types in the human brain and thus serve as a model for studying gene dysregulation in AD. However, AD-induced regulatory changes across brain cell types remains uncharted. To address this, we integrated single-cell multi-omics datasets to predict the gene regulatory networks of four major cell types, excitatory and inhibitory neurons, microglia and oligodendrocytes, in control and AD brains. Importantly, we analyzed and compared the structural and topological features of networks across cell types and examined changes in AD. Our analysis shows that hub TFs are largely common across cell types and AD-related changes are relatively more prominent in some cell types (e.g., microglia). The regulatory logics of enriched network motifs (e.g., feed-forward loops) further uncover cell type-specific TF-TF cooperativities in gene regulation. The cell type networks are also highly modular and several network modules with cell-type-specific expression changes in AD pathology are enriched with AD-risk genes. The further disease-module-drug association analysis suggests cell-type candidate drugs and their potential target genes. Finally, our network-based machine learning analysis systematically prioritized cell type risk genes likely involved in AD. Our strategy is validated using an independent dataset which showed that top ranked genes can predict clinical phenotypes (e.g., cognitive impairment) of AD with reasonable accuracy. Overall, this single-cell network biology analysis provides a comprehensive map linking genes, regulatory networks, cell types and drug targets and reveals cell-type gene dysregulation in AD.
Assuntos
Doença de Alzheimer , Doença de Alzheimer/metabolismo , Biologia , Reposicionamento de Medicamentos , Perfilação da Expressão Gênica , Redes Reguladoras de Genes/genética , Humanos , FenótipoRESUMO
BACKGROUND: Fragile X syndrome (FXS), the most prevalent inherited intellectual disability and one of the most common monogenic forms of autism, is caused by a loss of fragile X messenger ribonucleoprotein 1 (FMR1). We have previously shown that FMR1 represses the levels and activities of ubiquitin ligase MDM2 in young adult FMR1-deficient mice, and treatment by a MDM2 inhibitor Nutlin-3 rescues both hippocampal neurogenic and cognitive deficits in FMR1-deficient mice when analyzed shortly after the administration. However, it is unknown whether Nutlin-3 treatment can have long-lasting therapeutic effects. METHODS: We treated 2-month-old young adult FMR1-deficient mice with Nutlin-3 for 10 days and then assessed the persistent effect of Nutlin-3 on both cognitive functions and adult neurogenesis when mice were 6-month-old mature adults. To investigate the mechanisms underlying the persistent effects of Nutlin-3, we analyzed the proliferation and differentiation of neural stem/progenitor cells isolated from these mice and assessed the transcriptome of the hippocampal tissues of treated mice. RESULTS: We found that transient treatment with Nutlin-3 of 2-month-old young adult FMR1-deficient mice prevents the emergence of neurogenic and cognitive deficits in mature adult FXS mice at 6 months of age. We further found that the long-lasting restoration of neurogenesis and cognitive function might not be mediated by changing intrinsic properties of adult neural stem cells. Transcriptomic analysis of the hippocampal tissue demonstrated that transient Nultin-3 treatment leads to significant expression changes in genes related to the extracellular matrix, secreted factors, and cell membrane proteins in the FMR1-deficient hippocampus. CONCLUSIONS: Our data indicates that transient Nutlin-3 treatment in young adults leads to long-lasting neurogenic and behavioral changes likely through modulating adult neurogenic niche that impact adult neural stem cells. Our results demonstrate that cognitive impairments in FXS may be prevented by an early intervention through Nutlin-3 treatment.
Assuntos
Disfunção Cognitiva , Síndrome do Cromossomo X Frágil , Animais , Cognição , Disfunção Cognitiva/tratamento farmacológico , Intervenção em Crise , Modelos Animais de Doenças , Proteína do X Frágil da Deficiência Intelectual/genética , Proteína do X Frágil da Deficiência Intelectual/metabolismo , Síndrome do Cromossomo X Frágil/tratamento farmacológico , Síndrome do Cromossomo X Frágil/genética , Síndrome do Cromossomo X Frágil/metabolismo , Hipocampo/metabolismo , Imidazóis , Camundongos , Camundongos Knockout , Neurogênese , PiperazinasRESUMO
SUMMARY: Population studies such as genome-wide association study have identified a variety of genomic variants associated with human diseases. To further understand potential mechanisms of disease variants, recent statistical methods associate functional omic data (e.g. gene expression) with genotype and phenotype and link variants to individual genes. However, how to interpret molecular mechanisms from such associations, especially across omics, is still challenging. To address this problem, we developed an interpretable deep learning method, Varmole, to simultaneously reveal genomic functions and mechanisms while predicting phenotype from genotype. In particular, Varmole embeds multi-omic networks into a deep neural network architecture and prioritizes variants, genes and regulatory linkages via biological drop-connect without needing prior feature selections. AVAILABILITY AND IMPLEMENTATION: Varmole is available as a Python tool on GitHub at https://github.com/daifengwanglab/Varmole. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Estudo de Associação Genômica Ampla , Redes Neurais de Computação , Genoma , Genômica , Genótipo , HumanosRESUMO
MOTIVATION: Gene expression and regulation, a key molecular mechanism driving human disease development, remains elusive, especially at early stages. Integrating the increasing amount of population-level genomic data and understanding gene regulatory mechanisms in disease development are still challenging. Machine learning has emerged to solve this, but many machine learning methods were typically limited to building an accurate prediction model as a 'black box', barely providing biological and clinical interpretability from the box. RESULTS: To address these challenges, we developed an interpretable and scalable machine learning model, ECMarker, to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms. Particularly, ECMarker is built on the integration of semi- and discriminative-restricted Boltzmann machines, a neural network model for classification allowing lateral connections at the input gene layer. This interpretable model is scalable without needing any prior feature selection and enables directly modeling and prioritizing genes and revealing potential gene networks (from lateral connections) for the phenotypes. With application to the gene expression data of non-small-cell lung cancer patients, we found that ECMarker not only achieved a relatively high accuracy for predicting cancer stages but also identified the biomarker genes and gene networks implying the regulatory mechanisms in the lung cancer development. In addition, ECMarker demonstrates clinical interpretability as its prioritized biomarker genes can predict survival rates of early lung cancer patients (P-value < 0.005). Finally, we identified a number of drugs currently in clinical use for late stages or other cancers with effects on these early lung cancer biomarkers, suggesting potential novel candidates on early cancer medicine. AVAILABILITYAND IMPLEMENTATION: ECMarker is open source as a general-purpose tool at https://github.com/daifengwanglab/ECMarker. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Carcinoma Pulmonar de Células não Pequenas , Neoplasias Pulmonares , Biomarcadores , Expressão Gênica , Humanos , Neoplasias Pulmonares/genética , Aprendizado de MáquinaRESUMO
Parvalbumin interneurons (PVIs) are affected in many psychiatric disorders including schizophrenia (SCZ), however the mechanism remains unclear. FXR1, a high confident risk gene for SCZ, is indispensable but its role in the brain is largely unknown. We show that deleting FXR1 from PVIs of medial prefrontal cortex (mPFC) leads to reduced PVI excitability, impaired mPFC gamma oscillation, and SCZ-like behaviors. PVI-specific translational profiling reveals that FXR1 regulates the expression of Cacna1h/Cav3.2 a T-type calcium channel implicated in autism and epilepsy. Inhibition of Cav3.2 in PVIs of mPFC phenocopies whereas elevation of Cav3.2 in PVIs of mPFC rescues behavioral deficits resulted from FXR1 deficiency. Stimulation of PVIs using a gamma oscillation-enhancing light flicker rescues behavioral abnormalities caused by FXR1 deficiency in PVIs. This work unveils the function of a newly identified SCZ risk gene in SCZ-relevant neurons and identifies a therapeutic target and a potential noninvasive treatment for psychiatric disorders.
Assuntos
Parvalbuminas , Esquizofrenia , Humanos , Interneurônios/metabolismo , Neurônios/metabolismo , Parvalbuminas/metabolismo , Córtex Pré-Frontal/metabolismo , Proteínas de Ligação a RNA/metabolismo , Esquizofrenia/genética , Esquizofrenia/metabolismoRESUMO
The molecular mechanisms and functions in complex biological systems currently remain elusive. Recent high-throughput techniques, such as next-generation sequencing, have generated a wide variety of multiomics datasets that enable the identification of biological functions and mechanisms via multiple facets. However, integrating these large-scale multiomics data and discovering functional insights are, nevertheless, challenging tasks. To address these challenges, machine learning has been broadly applied to analyze multiomics. This review introduces multiview learning-an emerging machine learning field-and envisions its potentially powerful applications to multiomics. In particular, multiview learning is more effective than previous integrative methods for learning data's heterogeneity and revealing cross-talk patterns. Although it has been applied to various contexts, such as computer vision and speech recognition, multiview learning has not yet been widely applied to biological data-specifically, multiomics data. Therefore, this paper firstly reviews recent multiview learning methods and unifies them in a framework called multiview empirical risk minimization (MV-ERM). We further discuss the potential applications of each method to multiomics, including genomics, transcriptomics, and epigenomics, in an aim to discover the functional and mechanistic interpretations across omics. Secondly, we explore possible applications to different biological systems, including human diseases (e.g., brain disorders and cancers), plants, and single-cell analysis, and discuss both the benefits and caveats of using multiview learning to discover the molecular mechanisms and functions of these systems.
Assuntos
Biologia Computacional/métodos , Genômica/métodos , Aprendizado de Máquina , Proteômica/métodos , Algoritmos , Doença de Alzheimer/fisiopatologia , Encéfalo/fisiologia , Encéfalo/fisiopatologia , Chlamydomonas reinhardtii , Análise por Conglomerados , DNA , Interpretação Estatística de Dados , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Metabolômica/métodos , Análise de Célula Única , SoftwareRESUMO
The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters.
Assuntos
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Perfilação da Expressão Gênica , Transcriptoma/genética , Animais , Caenorhabditis elegans/embriologia , Caenorhabditis elegans/crescimento & desenvolvimento , Cromatina/genética , Análise por Conglomerados , Drosophila melanogaster/crescimento & desenvolvimento , Regulação da Expressão Gênica no Desenvolvimento/genética , Histonas/metabolismo , Humanos , Larva/genética , Larva/crescimento & desenvolvimento , Modelos Genéticos , Anotação de Sequência Molecular , Regiões Promotoras Genéticas/genética , Pupa/genética , Pupa/crescimento & desenvolvimento , RNA não Traduzido/genética , Análise de Sequência de RNARESUMO
Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.
Assuntos
Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Evolução Molecular , Regulação da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , Caenorhabditis elegans/crescimento & desenvolvimento , Imunoprecipitação da Cromatina , Sequência Conservada/genética , Drosophila melanogaster/crescimento & desenvolvimento , Regulação da Expressão Gênica no Desenvolvimento/genética , Genoma/genética , Humanos , Anotação de Sequência Molecular , Motivos de Nucleotídeos/genética , Especificidade de Órgãos/genética , Fatores de Transcrição/genéticaRESUMO
BACKGROUND: The coordination of genomic functions is a critical and complex process across biological systems such as phenotypes or states (e.g., time, disease, organism, environmental perturbation). Understanding how the complexity of genomic function relates to these states remains a challenge. To address this, we have developed a novel computational method, ManiNetCluster, which simultaneously aligns and clusters gene networks (e.g., co-expression) to systematically reveal the links of genomic function between different conditions. Specifically, ManiNetCluster employs manifold learning to uncover and match local and non-linear structures among networks, and identifies cross-network functional links. RESULTS: We demonstrated that ManiNetCluster better aligns the orthologous genes from their developmental expression profiles across model organisms than state-of-the-art methods (p-value <2.2×10-16). This indicates the potential non-linear interactions of evolutionarily conserved genes across species in development. Furthermore, we applied ManiNetCluster to time series transcriptome data measured in the green alga Chlamydomonas reinhardtii to discover the genomic functions linking various metabolic processes between the light and dark periods of a diurnally cycling culture. We identified a number of genes putatively regulating processes across each lighting regime. CONCLUSIONS: ManiNetCluster provides a novel computational tool to uncover the genes linking various functions from different networks, providing new insight on how gene functions coordinate across different conditions. ManiNetCluster is publicly available as an R package at https://github.com/daifengwanglab/ManiNetCluster.
Assuntos
Algoritmos , Redes Reguladoras de Genes/genética , Genômica/métodos , Evolução Biológica , Análise por Conglomerados , Aprendizado de Máquina , Dinâmica não Linear , Fenótipo , Software , Transcriptoma/genéticaRESUMO
The emergence of collective creative enterprise such as large scientific consortia is a unique feature in modern scientific research. We analyzed the temporal co-authorship network structures of ENCODE and modENCODE consortia. Our analysis revealed that the consortium members work closely as a community whereas non-members collaborate in the scale of a few laboratories. We also identified a few brokers playing an important role to facilitate collaborations with outside researchers.
Assuntos
Comportamento Cooperativo , Revisão da Pesquisa por Pares/tendências , HumanosRESUMO
Bioenergetic requirements of hematopoietic stem cells and pluripotent stem cells (PSCs) vary with lineage fate, and cellular adaptations rely largely on substrate (glucose/glutamine) availability and mitochondrial function to balance tricarboxylic acid (TCA)-derived anabolic and redox-regulated antioxidant functions. Heme synthesis and degradation converge in a linear pathway that utilizes TCA cycle-derived carbon in cataplerotic reactions of tetrapyrrole biosynthesis, terminated by NAD(P)H-dependent biliverdin reductases (IXα, BLVRA and IXß, BLVRB) that lead to bilirubin generation and cellular antioxidant functions. We now demonstrate that PSCs with targeted deletion of BLVRB display physiologically defective antioxidant activity and cellular viability, associated with a glutamine-restricted defect in TCA entry that was computationally predicted using gene/metabolite topological network analysis and subsequently validated by bioenergetic and isotopomeric studies. Defective BLVRB-regulated glutamine utilization was accompanied by exaggerated glycolytic accumulation of the rate-limiting hexokinase reaction product glucose-6-phosphate. BLVRB-deficient embryoid body formation (a critical size parameter of early lineage fate potential) demonstrated enhanced sensitivity to the pentose phosphate pathway (PPP) inhibitor 6-aminonicotinamide with no differences in the glycolytic pathway inhibitor 2-deoxyglucose. These collective data place heme catabolism in a crucial pathway of glutamine-regulated bioenergetic metabolism and suggest that early stages of lineage fate potential require glutamine anaplerotic functions and an intact PPP, which are, in part, regulated by BLVRB activity. In principle, BLVRB inhibition represents an alternative strategy for modulating cellular glutamine utilization with consequences for cancer and hematopoietic metabolism.
Assuntos
Células-Tronco Embrionárias/metabolismo , Glutamina/metabolismo , Oxirredutases atuantes sobre Doadores de Grupo CH-CH/fisiologia , Células Cultivadas , Metabolismo Energético/genética , Técnicas de Introdução de Genes , Glucose/metabolismo , Glicólise/genética , Heme/metabolismo , Humanos , Oxirredutases atuantes sobre Doadores de Grupo CH-CH/genética , Via de Pentose Fosfato/genética , Especificidade por SubstratoRESUMO
The global measurement of assembly and turnover of protein containing complexes within cells has advanced with the development of pulse stable isotope labelled amino acid approaches. Stable isotope labeling with amino acids in cell culture (SILAC) allows the incorporation of "light" 12-carbon amino acids or "heavy" 13-carbon amino acids into cells or organisms and the quantitation of proteins and peptides containing these amino acid tags using mass spectrometry. The use of these labels in pulse or pulse-chase scenarios has enabled measurements of macromolecular dynamics in cells, on time scales of several hours. Here we review advances with this approach and alternative or parallel strategies. We also examine the statistical considerations impacting datasets detailing mitochondrial assembly, to highlight key parameters in establishing significance and reproducibility.
Assuntos
Aminoácidos/química , Técnicas de Cultura de Células , Marcação por Isótopo , Espectrometria de Massas , Proteínas/análise , Reprodutibilidade dos TestesRESUMO
Transcriptome data sets from thousands of samples of the model plant Arabidopsis thaliana have been collectively generated by multiple individual labs. Although integration and meta-analysis of these samples has become routine in the plant research community, it is often hampered by a lack of metadata or differences in annotation styles of different labs. In this study, we carefully selected and integrated 6057 Arabidopsis microarray expression samples from 304 experiments deposited to the Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI). Metadata such as tissue type, growth conditions and developmental stage were manually curated for each sample. We then studied the global expression landscape of the integrated data set and found that samples of the same tissue tend to be more similar to each other than to samples of other tissues, even in different growth conditions or developmental stages. Root has the most distinct transcriptome, compared with aerial tissues, but the transcriptome of cultured root is more similar to the transcriptome of aerial tissues, as the cultured root samples lost their cellular identity. Using a simple computational classification method, we showed that the tissue type of a sample can be successfully predicted based on its expression profile, opening the door for automatic metadata extraction and facilitating the re-use of plant transcriptome data. As a proof of principle, we applied our automated annotation pipeline to 708 RNA-seq samples from public repositories and verified the accuracy of our predictions with sample metadata provided by the authors.
Assuntos
Arabidopsis/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteínas de Arabidopsis/genética , Regulação da Expressão Gênica de Plantas/genéticaRESUMO
BACKGROUND: The epithelial to mesenchymal transition (EMT) plays a key role in lung cancer progression and drug resistance. The dynamics and stability of gene expression patterns as cancer cells transition from E to M at a systems level and relevance to patient outcomes are unknown. METHODS: Using comparative network and clustering analysis, we systematically analyzed time-series gene expression data from lung cancer cell lines H358 and A549 that were induced to undergo EMT. We also predicted the putative regulatory networks controlling EMT expression dynamics, especially for the EMT-dynamic genes and related these patterns to patient outcomes using data from TCGA. Example EMT hub regulatory genes were validated using RNAi. RESULTS: We identified several novel genes distinct from the static states of E or M that exhibited temporal expression patterns or 'periods' during the EMT process that were shared in different lung cancer cell lines. For example, cell cycle and metabolic genes were found to be similarly down-regulated where immune-associated genes were up-regulated after middle EMT stages. The presence of EMT-dynamic gene expression patterns supports the presence of differential activation and repression timings at the transcriptional level for various pathways and functions during EMT that are not detected in pure E or M cells. Importantly, the cell line identified EMT-dynamic genes were found to be present in lung cancer patient tissues and associated with patient outcomes. CONCLUSIONS: Our study suggests that in vitro identified EMT-dynamic genes capture elements of gene EMT expression dynamics at the patient level. Measurement of EMT dynamic genes, as opposed to E or M only, is potentially useful in future efforts aimed at classifying patient's responses to treatments based on the EMT dynamics in the tissue.
Assuntos
Transição Epitelial-Mesenquimal/genética , Regulação Neoplásica da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Neoplasias Pulmonares/genética , Linhagem Celular Tumoral , Biologia Computacional , Perfilação da Expressão Gênica , Humanos , Transcriptoma/genéticaRESUMO
Gene expression is controlled by the combinatorial effects of regulatory factors from different biological subsystems such as general transcription factors (TFs), cellular growth factors and microRNAs. A subsystem's gene expression may be controlled by its internal regulatory factors, exclusively, or by external subsystems, or by both. It is thus useful to distinguish the degree to which a subsystem is regulated internally or externally-e.g., how non-conserved, species-specific TFs affect the expression of conserved, cross-species genes during evolution. We developed a computational method (DREISS, dreiss.gerteinlab.org) for analyzing the Dynamics of gene expression driven by Regulatory networks, both External and Internal based on State Space models. Given a subsystem, the "state" and "control" in the model refer to its own (internal) and another subsystem's (external) gene expression levels. The state at a given time is determined by the state and control at a previous time. Because typical time-series data do not have enough samples to fully estimate the model's parameters, DREISS uses dimensionality reduction, and identifies canonical temporal expression trajectories (e.g., degradation, growth and oscillation) representing the regulatory effects emanating from various subsystems. To demonstrate capabilities of DREISS, we study the regulatory effects of evolutionarily conserved vs. divergent TFs across distant species. In particular, we applied DREISS to the time-series gene expression datasets of C. elegans and D. melanogaster during their embryonic development. We analyzed the expression dynamics of the conserved, orthologous genes (orthologs), seeing the degree to which these can be accounted for by orthologous (internal) versus species-specific (external) TFs. We found that between two species, the orthologs have matched, internally driven expression patterns but very different externally driven ones. This is particularly true for genes with evolutionarily ancient functions (e.g. the ribosomal proteins), in contrast to those with more recently evolved functions (e.g., cell-cell communication). This suggests that despite striking morphological differences, some fundamental embryonic-developmental processes are still controlled by ancient regulatory systems.
Assuntos
Algoritmos , Regulação da Expressão Gênica/fisiologia , Redes Reguladoras de Genes/fisiologia , Modelos Biológicos , Proteoma/metabolismo , Software , Animais , Simulação por Computador , Retroalimentação Fisiológica/fisiologia , HumanosRESUMO
Pseudogenes are degraded fossil copies of genes. Here, we report a comparison of pseudogenes spanning three phyla, leveraging the completed annotations of the human, worm, and fly genomes, which we make available as an online resource. We find that pseudogenes are lineage specific, much more so than protein-coding genes, reflecting the different remodeling processes marking each organism's genome evolution. The majority of human pseudogenes are processed, resulting from a retrotranspositional burst at the dawn of the primate lineage. This burst can be seen in the largely uniform distribution of pseudogenes across the genome, their preservation in areas with low recombination rates, and their preponderance in highly expressed gene families. In contrast, worm and fly pseudogenes tell a story of numerous duplication events. In worm, these duplications have been preserved through selective sweeps, so we see a large number of pseudogenes associated with highly duplicated families such as chemoreceptors. However, in fly, the large effective population size and high deletion rate resulted in a depletion of the pseudogene complement. Despite large variations between these species, we also find notable similarities. Overall, we identify a broad spectrum of biochemical activity for pseudogenes, with the majority in each organism exhibiting varying degrees of partial activity. In particular, we identify a consistent amount of transcription (â¼15%) across all species, suggesting a uniform degradation process. Also, we see a uniform decay of pseudogene promoter activity relative to their coding counterparts and identify a number of pseudogenes with conserved upstream sequences and activity, hinting at potential regulatory roles.