RESUMO
Understanding how each person's unique genotype influences their individual patterns of gene regulation has the potential to improve our understanding of human health and development, and to refine genotype-specific disease risk assessments and treatments. However, the effects of genetic variants are not typically considered when constructing gene regulatory networks, despite the fact that many disease-associated genetic variants are thought to have regulatory effects, including the disruption of transcription factor (TF) binding. We developed EGRET (Estimating the Genetic Regulatory Effect on TFs), which infers a genotype-specific gene regulatory network for each individual in a study population. EGRET begins by constructing a genotype-informed TF-gene prior network derived using TF motif predictions, expression quantitative trait locus (eQTL) data, individual genotypes, and the predicted effects of genetic variants on TF binding. It then uses a technique known as message passing to integrate this prior network with gene expression and TF protein-protein interaction data to produce a refined, genotype-specific regulatory network. We used EGRET to infer gene regulatory networks for two blood-derived cell lines and identified genotype-associated, cell line-specific regulatory differences that we subsequently validated using allele-specific expression, chromatin accessibility QTLs, and differential ChIP-seq TF binding. We also inferred EGRET networks for three cell types from each of 119 individuals and identified cell type-specific regulatory differences associated with diseases related to those cell types. EGRET is, to our knowledge, the first method that infers networks reflective of individual genetic variation in a way that provides insight into the genetic regulatory associations driving complex phenotypes.
Assuntos
Redes Reguladoras de Genes , Fatores de Transcrição , Cromatina , Imunoprecipitação da Cromatina , Genótipo , Humanos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismoRESUMO
The increasing quantity of multi-omic data, such as methylomic and transcriptomic profiles collected on the same specimen or even on the same cell, provides a unique opportunity to explore the complex interactions that define cell phenotype and govern cellular responses to perturbations. We propose a network approach based on Gaussian Graphical Models (GGMs) that facilitates the joint analysis of paired omics data. This method, called DRAGON (Determining Regulatory Associations using Graphical models on multi-Omic Networks), calibrates its parameters to achieve an optimal trade-off between the network's complexity and estimation accuracy, while explicitly accounting for the characteristics of each of the assessed omics 'layers.' In simulation studies, we show that DRAGON adapts to edge density and feature size differences between omics layers, improving model inference and edge recovery compared to state-of-the-art methods. We further demonstrate in an analysis of joint transcriptome - methylome data from TCGA breast cancer specimens that DRAGON can identify key molecular mechanisms such as gene regulation via promoter methylation. In particular, we identify Transcription Factor AP-2 Beta (TFAP2B) as a potential multi-omic biomarker for basal-type breast cancer. DRAGON is available as open-source code in Python through the Network Zoo package (netZooPy v0.8; netzoo.github.io).
Assuntos
Multiômica , Neoplasias , Humanos , Software , Simulação por Computador , Transcriptoma , Neoplasias/genética , Redes Reguladoras de GenesRESUMO
Gene regulation plays a fundamental role in shaping tissue identity, function, and response to perturbation. Regulatory processes are controlled by complex networks of interacting elements, including transcription factors, miRNAs and their target genes. The structure of these networks helps to determine phenotypes and can ultimately influence the development of disease or response to therapy. We developed GRAND (https://grand.networkmedicine.org) as a database for computationally-inferred, context-specific gene regulatory network models that can be compared between biological states, or used to predict which drugs produce changes in regulatory network structure. The database includes 12 468 genome-scale networks covering 36 human tissues, 28 cancers, 1378 unperturbed cell lines, as well as 173 013 TF and gene targeting scores for 2858 small molecule-induced cell line perturbation paired with phenotypic information. GRAND allows the networks to be queried using phenotypic information and visualized using a variety of interactive tools. In addition, it includes a web application that matches disease states to potentially therapeutic small molecule drugs using regulatory network properties.
Assuntos
Bases de Dados Genéticas , Bases de Dados de Produtos Farmacêuticos , Redes Reguladoras de Genes/genética , Software , Regulação da Expressão Gênica/genética , Genoma Humano/genética , Humanos , MicroRNAs/classificação , MicroRNAs/genética , Fatores de Transcrição/classificação , Fatores de Transcrição/genéticaRESUMO
BACKGROUND: Potato is the third most consumed crop in the world. Breeding for traits such as yield, product quality and pathogen resistance are main priorities. Identifying molecular signatures of these and other important traits is important in future breeding efforts. In this study, a progeny population from a cross between a breeding line, SW93-1015, and a cultivar, Désirée, was studied by trait analysis and RNA-seq in order to develop understanding of segregating traits at the molecular level and identify transcripts with expressional correlation to these traits. Transcript markers with predictive value for field performance applicable under controlled environments would be of great value for plant breeding. RESULTS: A total of 34 progeny lines from SW93-1015 and Désirée were phenotyped for 17 different traits in a field in Nordic climate conditions and controlled climate settings. A master transcriptome was constructed with all 34 progeny lines and the parents through a de novo assembly of RNA-seq reads. Gene expression data obtained in a controlled environment from the 34 lines was correlated to traits by different similarity indices, including Pearson and Spearman, as well as DUO, which calculates the co-occurrence between high and low values for gene expression and trait. Our study linked transcripts to traits such as yield, growth rate, high laying tubers, late and tuber blight, tuber greening and early flowering. We found several transcripts associated to late blight resistance and transcripts encoding receptors were associated to Dickeya solani susceptibility. Transcript levels of a UBX-domain protein was negatively associated to yield and a GLABRA2 expression modulator was negatively associated to growth rate. CONCLUSION: In our study, we identify 100's of transcripts, putatively linked based on expression with 17 traits of potato, representing both well-known and novel associations. This approach can be used to link the transcriptome to traits. We explore the possibility of associating the level of transcript expression from controlled, optimal environments to traits in a progeny population with different methods introducing the application of DUO for the first time on transcriptome data. We verify the expression pattern for five of the putative transcript markers in another progeny population.
Assuntos
Características de História de Vida , Fenótipo , Solanum tuberosum/genética , Transcriptoma , TetraploidiaRESUMO
Metagenomic data from Obsidian Pool (Yellowstone National Park, USA) and 13 genome sequences were used to reassess genus-wide biodiversity for the extremely thermophilic Caldicellulosiruptor The updated core genome contains 1,401 ortholog groups (average genome size for 13 species = 2,516 genes). The pangenome, which remains open with a revised total of 3,493 ortholog groups, encodes a variety of multidomain glycoside hydrolases (GHs). These include three cellulases with GH48 domains that are colocated in the glucan degradation locus (GDL) and are specific determinants for microcrystalline cellulose utilization. Three recently sequenced species, Caldicellulosiruptor sp. strain Rt8.B8 (renamed here Caldicellulosiruptor morganii), Thermoanaerobacter cellulolyticus strain NA10 (renamed here Caldicellulosiruptor naganoensis), and Caldicellulosiruptor sp. strain Wai35.B1 (renamed here Caldicellulosiruptor danielii), degraded Avicel and lignocellulose (switchgrass). C. morganii was more efficient than Caldicellulosiruptor bescii in this regard and differed from the other 12 species examined, both based on genome content and organization and in the specific domain features of conserved GHs. Metagenomic analysis of lignocellulose-enriched samples from Obsidian Pool revealed limited new information on genus biodiversity. Enrichments yielded genomic signatures closely related to that of Caldicellulosiruptor obsidiansis, but there was also evidence for other thermophilic fermentative anaerobes (Caldanaerobacter, Fervidobacterium, Caloramator, and Clostridium). One enrichment, containing 89.8% Caldicellulosiruptor and 9.7% Caloramator, had a capacity for switchgrass solubilization comparable to that of C. bescii These results refine the known biodiversity of Caldicellulosiruptor and indicate that microcrystalline cellulose degradation at temperatures above 70°C, based on current information, is limited to certain members of this genus that produce GH48 domain-containing enzymes.IMPORTANCE The genus Caldicellulosiruptor contains the most thermophilic bacteria capable of lignocellulose deconstruction, which are promising candidates for consolidated bioprocessing for the production of biofuels and bio-based chemicals. The focus here is on the extant capability of this genus for plant biomass degradation and the extent to which this can be inferred from the core and pangenomes, based on analysis of 13 species and metagenomic sequence information from environmental samples. Key to microcrystalline hydrolysis is the content of the glucan degradation locus (GDL), a set of genes encoding glycoside hydrolases (GHs), several of which have GH48 and family 3 carbohydrate binding module domains, that function as primary cellulases. Resolving the relationship between the GDL and lignocellulose degradation will inform efforts to identify more prolific members of the genus and to develop metabolic engineering strategies to improve this characteristic.
Assuntos
Firmicutes/genética , Firmicutes/metabolismo , Genoma Bacteriano , Lignina/metabolismo , Metagenoma , Celulose/metabolismo , Firmicutes/classificação , Genômica , MetagenômicaRESUMO
We present and develop the theory of 3-way networks, a type of hypergraph in which each edge models relationships between triplets of objects as opposed to pairs of objects as done by standard network models. We explore approaches of how to prune these 3-way networks, illustrate their utility in comparative genomics and demonstrate how they find relationships which would be missed by standard 2-way network models using a phylogenomic dataset of 211 bacterial genomes.
Assuntos
Bactérias , Genômica/métodos , Modelos Genéticos , Bactérias/classificação , Bactérias/genética , Genoma Bacteriano , FilogeniaRESUMO
BACKGROUND: Induced resistance (IR) can be part of a sustainable plant protection strategy against important plant diseases. ß-aminobutyric acid (BABA) can induce resistance in a wide range of plants against several types of pathogens, including potato infected with Phytophthora infestans. However, the molecular mechanisms behind this are unclear and seem to be dependent on the system studied. To elucidate the defence responses activated by BABA in potato, a genome-wide transcript microarray analysis in combination with label-free quantitative proteomics analysis of the apoplast secretome were performed two days after treatment of the leaf canopy with BABA at two concentrations, 1 and 10 mM. RESULTS: Over 5000 transcripts were differentially expressed and over 90 secretome proteins changed in abundance indicating a massive activation of defence mechanisms with 10 mM BABA, the concentration effective against late blight disease. To aid analysis, we present a more comprehensive functional annotation of the microarray probes and gene models by retrieving information from orthologous gene families across 26 sequenced plant genomes. The new annotation provided GO terms to 8616 previously un-annotated probes. CONCLUSIONS: BABA at 10 mM affected several processes related to plant hormones and amino acid metabolism. A major accumulation of PR proteins was also evident, and in the mevalonate pathway, genes involved in sterol biosynthesis were down-regulated, whereas several enzymes involved in the sesquiterpene phytoalexin biosynthesis were up-regulated. Interestingly, abscisic acid (ABA) responsive genes were not as clearly regulated by BABA in potato as previously reported in Arabidopsis. Together these findings provide candidates and markers for improved resistance in potato, one of the most important crops in the world.
Assuntos
Proteômica , Solanum tuberosum/metabolismo , Transcriptoma , Phytophthora/patogenicidade , Solanum tuberosum/genética , Solanum tuberosum/microbiologiaRESUMO
Profiling of whole transcriptomes has become a cornerstone of molecular biology and an invaluable tool for the characterization of clinical phenotypes and the identification of disease subtypes. Analyses of these data are becoming ever more sophisticated as we move beyond simple comparisons to consider networks of higher-order interactions and associations. Gene regulatory networks (GRNs) model the regulatory relationships of transcription factors and genes and have allowed the identification of differentially regulated processes in disease systems. In this perspective, we discuss gene targeting scores, which measure changes in inferred regulatory network interactions, and their use in identifying disease-relevant processes. In addition, we present an example analysis for pancreatic ductal adenocarcinoma (PDAC), demonstrating the power of gene targeting scores to identify differential processes between complex phenotypes, processes that would have been missed by only performing differential expression analysis. This example demonstrates that gene targeting scores are an invaluable addition to gene expression analysis in the characterization of diseases and other complex phenotypes.
RESUMO
Bipartite network inference is a ubiquitous problem across disciplines. One important example in the field molecular biology is gene regulatory network inference. Gene regulatory networks are an instrumental tool aiding in the discovery of the molecular mechanisms driving diverse diseases, including cancer. However, only noisy observations of the projections of these regulatory networks are typically assayed. In an effort to better estimate regulatory networks from their noisy projections, we formulate a non-convex but analytically tractable optimization problem called OTTER. This problem can be interpreted as relaxed graph matching between the two projections of the bipartite network. OTTER's solutions can be derived explicitly and inspire a spectral algorithm, for which we provide network recovery guarantees. We also provide an alternative approach based on gradient descent that is more robust to noise compared to the spectral algorithm. Interestingly, this gradient descent approach resembles the message passing equations of an established gene regulatory network inference method, PANDA. Using three cancer-related data sets, we show that OTTER outperforms state-of-the-art inference methods in predicting transcription factor binding to gene regulatory regions. To encourage new graph matching applications to this problem, we have made all networks and validation data publicly available.
RESUMO
We demonstrate a selection of network and machine learning techniques useful in the analysis of complex datasets, including 2-way similarity networks, Markov clustering, enrichment statistical networks, FCROS differential analysis, and random forests. We demonstrate each of these techniques on the Populus trichocarpa gene expression atlas.
Assuntos
Bases de Dados como Assunto , Redes Reguladoras de Genes , Populus/genética , Algoritmos , Análise por Conglomerados , Regulação da Expressão Gênica de Plantas , SoftwareRESUMO
Populus trichocarpa is an important biofuel feedstock that has been the target of extensive research and is emerging as a model organism for plants, especially woody perennials. This research has generated several large 'omics datasets. However, only few studies in Populus have attempted to integrate various data types. This review will summarize various 'omics data layers, focusing on their application in Populus species. Subsequently, network and signal processing techniques for the integration and analysis of these data types will be discussed, with particular reference to examples in Populus.
RESUMO
Various 'omics data types have been generated for Populus trichocarpa, each providing a layer of information which can be represented as a density signal across a chromosome. We make use of genome sequence data, variants data across a population as well as methylation data across 10 different tissues, combined with wavelet-based signal processing to perform a comprehensive analysis of the signature of the centromere in these different data signals, and successfully identify putative centromeric regions in P. trichocarpa from these signals. Furthermore, using SNP (single nucleotide polymorphism) correlations across a natural population of P. trichocarpa, we find evidence for the co-evolution of the centromeric histone CENH3 with the sequence of the newly identified centromeric regions, and identify a new CENH3 candidate in P. trichocarpa.
RESUMO
Plant root-associated microbial symbionts comprise the plant rhizobiome. These microbes function in provisioning nutrients and water to their hosts, impacting plant health and disease. The plant microbiome is shaped by plant species, plant genotype, soil and environmental conditions, but the contributions of these variables are hard to disentangle from each other in natural systems. We used bioassay common garden experiments to decouple plant genotype and soil property impacts on fungal and bacterial community structure in the Populus rhizobiome. High throughput amplification and sequencing of 16S, ITS, 28S and 18S rDNA was accomplished through 454 pyrosequencing. Co-association patterns of fungal and bacterial taxa were assessed with 16S and ITS datasets. Community bipartite fungal-bacterial networks and PERMANOVA results attribute significant difference in fungal or bacterial communities to soil origin, soil chemical properties and plant genotype. Indicator species analysis identified a common set of root bacteria as well as endophytic and ectomycorrhizal fungi associated with Populus in different soils. However, no single taxon, or consortium of microbes, was indicative of a particular Populus genotype. Fungal-bacterial networks were over-represented in arbuscular mycorrhizal, endophytic, and ectomycorrhizal fungi, as well as bacteria belonging to the orders Rhizobiales, Chitinophagales, Cytophagales, and Burkholderiales. These results demonstrate the importance of soil and plant genotype on fungal-bacterial networks in the belowground plant microbiome.
RESUMO
Various patterns of multi-phenotype associations (MPAs) exist in the results of Genome Wide Association Studies (GWAS) involving different topologies of single nucleotide polymorphism (SNP)-phenotype associations. These can provide interesting information about the different impacts of a gene on closely related phenotypes or disparate phenotypes (pleiotropy). In this work we present MPA Decomposition, a new network-based approach which decomposes the results of a multi-phenotype GWAS study into three bipartite networks, which, when used together, unravel the multi-phenotype signatures of genes on a genome-wide scale. The decomposition involves the construction of a phenotype powerset space, and subsequent mapping of genes into this new space. Clustering of genes in this powerset space groups genes based on their detailed MPA signatures. We show that this method allows us to find multiple different MPA and pleiotropic signatures within individual genes and to classify and cluster genes based on these SNP-phenotype association topologies. We demonstrate the use of this approach on a GWAS analysis of a large population of 882 Populus trichocarpa genotypes using untargeted metabolomics phenotypes. This method should prove invaluable in the interpretation of large GWAS datasets and aid in future synthetic biology efforts designed to optimize phenotypes of interest.
RESUMO
Understanding the regulatory network controlling cell wall biosynthesis is of great interest in Populus trichocarpa, both because of its status as a model woody perennial and its importance for lignocellulosic products. We searched for genes with putatively unknown roles in regulating cell wall biosynthesis using an extended network-based Lines of Evidence (LOE) pipeline to combine multiple omics data sets in P. trichocarpa, including gene coexpression, gene comethylation, population level pairwise SNP correlations, and two distinct SNP-metabolite Genome Wide Association Study (GWAS) layers. By incorporating validation, ranking, and filtering approaches we produced a list of nine high priority gene candidates for involvement in the regulation of cell wall biosynthesis. We subsequently performed a detailed investigation of candidate gene GROWTH-REGULATING FACTOR 9 (PtGRF9). To investigate the role of PtGRF9 in regulating cell wall biosynthesis, we assessed the genome-wide connections of PtGRF9 and a paralog across data layers with functional enrichment analyses, predictive transcription factor binding site analysis, and an independent comparison to eQTN data. Our findings indicate that PtGRF9 likely affects the cell wall by directly repressing genes involved in cell wall biosynthesis, such as PtCCoAOMT and PtMYB.41, and indirectly by regulating homeobox genes. Furthermore, evidence suggests that PtGRF9 paralogs may act as transcriptional co-regulators that direct the global energy usage of the plant. Using our extended pipeline, we show multiple lines of evidence implicating the involvement of these genes in cell wall regulatory functions and demonstrate the value of this method for prioritizing candidate genes for experimental validation.
RESUMO
Polyglutamine (polyQ) stretches have been reported to occur in proteins across many organisms including animals, fungi and plants. Expansion of these repeats has attracted much attention due their associations with numerous human diseases including Huntington's and other neurological maladies. This suggests that the relative length of polyQ stretches is an important modulator of their function. Here, we report the identification of a Populus C-terminus binding protein (CtBP) ANGUSTIFOLIA (PtAN1) which contains a polyQ stretch whose functional relevance had not been established. Analysis of 917 resequenced Populus trichocarpa genotypes revealed three allelic variants at this locus encoding 11-, 13- and 15-glutamine residues. Transient expression assays using Populus leaf mesophyll protoplasts revealed that the 11Q variant exhibited strong nuclear localization whereas the 15Q variant was only found in the cytosol, with the 13Q variant exhibiting localization in both subcellular compartments. We assessed functional implications by evaluating expression changes of putative PtAN1 targets in response to overexpression of the three allelic variants and observed allele-specific differences in expression levels of putative targets. Our results provide evidence that variation in polyQ length modulates PtAN1 function by altering subcellular localization.
Assuntos
Núcleo Celular/metabolismo , Proteínas de Ligação a DNA/metabolismo , Peptídeos/química , Proteínas de Plantas/metabolismo , Populus/genética , Transporte Ativo do Núcleo Celular , Alelos , Proteínas de Ligação a DNA/química , Proteínas de Plantas/química , Populus/metabolismo , Sinais Direcionadores de ProteínasRESUMO
A characteristic feature of plant cells is the ability to form callus from parenchyma cells in response to biotic and abiotic stimuli. Tissue culture propagation of recalcitrant plant species and genetic engineering for desired phenotypes typically depends on efficient in vitro callus generation. Callus formation is under genetic regulation, and consequently, a molecular understanding of this process underlies successful generation for propagation materials and/or introduction of genetic elements in experimental or industrial applications. Herein, we identified 11 genetic loci significantly associated with callus formation in Populus trichocarpa using a genome-wide association study (GWAS) approach. Eight of the 11 significant gene associations were consistent across biological replications, exceeding a chromosome-wide-log10 (p) = 4.46 [p = 3.47E-05] Bonferroni-adjusted significance threshold. These eight genes were used as hub genes in a high-resolution co-expression network analysis to gain insight into the genome-wide basis of callus formation. A network of positively and negatively co-expressed genes, including several transcription factors, was identified. As proof-of-principle, a transient protoplast assay confirmed the negative regulation of a Chloroplast Nucleoid DNA-binding-related gene (Potri.018G014800) by the LEC2 transcription factor. Many of the candidate genes and co-expressed genes were 1) linked to cell division and cell cycling in plants and 2) showed homology to tumor and cancer-related genes in humans. The GWAS approach based on a high-resolution marker set, and the ability to manipulate targets genes in vitro, provided a catalog of high-confidence genes linked to callus formation that can serve as an important resource for successful manipulation of model and non-model plant species, and likewise, suggests a robust method of discovering common homologous functions across organisms.
Assuntos
Calo Ósseo/crescimento & desenvolvimento , Populus/genética , Fatores de Transcrição/genética , Regulação da Expressão Gênica de Plantas , Estudo de Associação Genômica Ampla , Fenótipo , Populus/crescimento & desenvolvimentoRESUMO
We explore the use of a network meta-modeling approach to compare the effects of similarity metrics used to construct biological networks on the topology of the resulting networks. This work reviews various similarity metrics for the construction of networks and various topology measures for the characterization of resulting network topology, demonstrating the use of these metrics in the construction and comparison of phylogenomic and transcriptomic networks.