Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
Add more filters










Publication year range
2.
Nucleic Acids Res ; 51(3): e15, 2023 02 22.
Article in English | MEDLINE | ID: mdl-36533448

ABSTRACT

The increasing quantity of multi-omic data, such as methylomic and transcriptomic profiles collected on the same specimen or even on the same cell, provides a unique opportunity to explore the complex interactions that define cell phenotype and govern cellular responses to perturbations. We propose a network approach based on Gaussian Graphical Models (GGMs) that facilitates the joint analysis of paired omics data. This method, called DRAGON (Determining Regulatory Associations using Graphical models on multi-Omic Networks), calibrates its parameters to achieve an optimal trade-off between the network's complexity and estimation accuracy, while explicitly accounting for the characteristics of each of the assessed omics 'layers.' In simulation studies, we show that DRAGON adapts to edge density and feature size differences between omics layers, improving model inference and edge recovery compared to state-of-the-art methods. We further demonstrate in an analysis of joint transcriptome - methylome data from TCGA breast cancer specimens that DRAGON can identify key molecular mechanisms such as gene regulation via promoter methylation. In particular, we identify Transcription Factor AP-2 Beta (TFAP2B) as a potential multi-omic biomarker for basal-type breast cancer. DRAGON is available as open-source code in Python through the Network Zoo package (netZooPy v0.8; netzoo.github.io).


Subject(s)
Multiomics , Neoplasms , Humans , Software , Computer Simulation , Transcriptome , Neoplasms/genetics , Gene Regulatory Networks
4.
Genome Res ; 32(3): 524-533, 2022 03.
Article in English | MEDLINE | ID: mdl-35193937

ABSTRACT

Understanding how each person's unique genotype influences their individual patterns of gene regulation has the potential to improve our understanding of human health and development, and to refine genotype-specific disease risk assessments and treatments. However, the effects of genetic variants are not typically considered when constructing gene regulatory networks, despite the fact that many disease-associated genetic variants are thought to have regulatory effects, including the disruption of transcription factor (TF) binding. We developed EGRET (Estimating the Genetic Regulatory Effect on TFs), which infers a genotype-specific gene regulatory network for each individual in a study population. EGRET begins by constructing a genotype-informed TF-gene prior network derived using TF motif predictions, expression quantitative trait locus (eQTL) data, individual genotypes, and the predicted effects of genetic variants on TF binding. It then uses a technique known as message passing to integrate this prior network with gene expression and TF protein-protein interaction data to produce a refined, genotype-specific regulatory network. We used EGRET to infer gene regulatory networks for two blood-derived cell lines and identified genotype-associated, cell line-specific regulatory differences that we subsequently validated using allele-specific expression, chromatin accessibility QTLs, and differential ChIP-seq TF binding. We also inferred EGRET networks for three cell types from each of 119 individuals and identified cell type-specific regulatory differences associated with diseases related to those cell types. EGRET is, to our knowledge, the first method that infers networks reflective of individual genetic variation in a way that provides insight into the genetic regulatory associations driving complex phenotypes.


Subject(s)
Gene Regulatory Networks , Transcription Factors , Chromatin , Chromatin Immunoprecipitation , Genotype , Humans , Transcription Factors/genetics , Transcription Factors/metabolism
5.
Nucleic Acids Res ; 50(D1): D610-D621, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34508353

ABSTRACT

Gene regulation plays a fundamental role in shaping tissue identity, function, and response to perturbation. Regulatory processes are controlled by complex networks of interacting elements, including transcription factors, miRNAs and their target genes. The structure of these networks helps to determine phenotypes and can ultimately influence the development of disease or response to therapy. We developed GRAND (https://grand.networkmedicine.org) as a database for computationally-inferred, context-specific gene regulatory network models that can be compared between biological states, or used to predict which drugs produce changes in regulatory network structure. The database includes 12 468 genome-scale networks covering 36 human tissues, 28 cancers, 1378 unperturbed cell lines, as well as 173 013 TF and gene targeting scores for 2858 small molecule-induced cell line perturbation paired with phenotypic information. GRAND allows the networks to be queried using phenotypic information and visualized using a variety of interactive tools. In addition, it includes a web application that matches disease states to potentially therapeutic small molecule drugs using regulatory network properties.


Subject(s)
Databases, Genetic , Databases, Pharmaceutical , Gene Regulatory Networks/genetics , Software , Gene Expression Regulation/genetics , Genome, Human/genetics , Humans , MicroRNAs/classification , MicroRNAs/genetics , Transcription Factors/classification , Transcription Factors/genetics
6.
Proc AAAI Conf Artif Intell ; 35(11): 10263-10272, 2021 Feb.
Article in English | MEDLINE | ID: mdl-34707916

ABSTRACT

Bipartite network inference is a ubiquitous problem across disciplines. One important example in the field molecular biology is gene regulatory network inference. Gene regulatory networks are an instrumental tool aiding in the discovery of the molecular mechanisms driving diverse diseases, including cancer. However, only noisy observations of the projections of these regulatory networks are typically assayed. In an effort to better estimate regulatory networks from their noisy projections, we formulate a non-convex but analytically tractable optimization problem called OTTER. This problem can be interpreted as relaxed graph matching between the two projections of the bipartite network. OTTER's solutions can be derived explicitly and inspire a spectral algorithm, for which we provide network recovery guarantees. We also provide an alternative approach based on gradient descent that is more robust to noise compared to the spectral algorithm. Interestingly, this gradient descent approach resembles the message passing equations of an established gene regulatory network inference method, PANDA. Using three cancer-related data sets, we show that OTTER outperforms state-of-the-art inference methods in predicting transcription factor binding to gene regulatory regions. To encourage new graph matching applications to this problem, we have made all networks and validation data publicly available.

7.
Front Genet ; 12: 649942, 2021.
Article in English | MEDLINE | ID: mdl-33968133

ABSTRACT

Profiling of whole transcriptomes has become a cornerstone of molecular biology and an invaluable tool for the characterization of clinical phenotypes and the identification of disease subtypes. Analyses of these data are becoming ever more sophisticated as we move beyond simple comparisons to consider networks of higher-order interactions and associations. Gene regulatory networks (GRNs) model the regulatory relationships of transcription factors and genes and have allowed the identification of differentially regulated processes in disease systems. In this perspective, we discuss gene targeting scores, which measure changes in inferred regulatory network interactions, and their use in identifying disease-relevant processes. In addition, we present an example analysis for pancreatic ductal adenocarcinoma (PDAC), demonstrating the power of gene targeting scores to identify differential processes between complex phenotypes, processes that would have been missed by only performing differential expression analysis. This example demonstrates that gene targeting scores are an invaluable addition to gene expression analysis in the characterization of diseases and other complex phenotypes.

8.
Methods Mol Biol ; 2096: 197-215, 2020.
Article in English | MEDLINE | ID: mdl-32720156

ABSTRACT

We demonstrate a selection of network and machine learning techniques useful in the analysis of complex datasets, including 2-way similarity networks, Markov clustering, enrichment statistical networks, FCROS differential analysis, and random forests. We demonstrate each of these techniques on the Populus trichocarpa gene expression atlas.


Subject(s)
Databases as Topic , Gene Regulatory Networks , Populus/genetics , Algorithms , Cluster Analysis , Gene Expression Regulation, Plant , Software
9.
BMC Plant Biol ; 20(1): 120, 2020 Mar 18.
Article in English | MEDLINE | ID: mdl-32183694

ABSTRACT

BACKGROUND: Potato is the third most consumed crop in the world. Breeding for traits such as yield, product quality and pathogen resistance are main priorities. Identifying molecular signatures of these and other important traits is important in future breeding efforts. In this study, a progeny population from a cross between a breeding line, SW93-1015, and a cultivar, Désirée, was studied by trait analysis and RNA-seq in order to develop understanding of segregating traits at the molecular level and identify transcripts with expressional correlation to these traits. Transcript markers with predictive value for field performance applicable under controlled environments would be of great value for plant breeding. RESULTS: A total of 34 progeny lines from SW93-1015 and Désirée were phenotyped for 17 different traits in a field in Nordic climate conditions and controlled climate settings. A master transcriptome was constructed with all 34 progeny lines and the parents through a de novo assembly of RNA-seq reads. Gene expression data obtained in a controlled environment from the 34 lines was correlated to traits by different similarity indices, including Pearson and Spearman, as well as DUO, which calculates the co-occurrence between high and low values for gene expression and trait. Our study linked transcripts to traits such as yield, growth rate, high laying tubers, late and tuber blight, tuber greening and early flowering. We found several transcripts associated to late blight resistance and transcripts encoding receptors were associated to Dickeya solani susceptibility. Transcript levels of a UBX-domain protein was negatively associated to yield and a GLABRA2 expression modulator was negatively associated to growth rate. CONCLUSION: In our study, we identify 100's of transcripts, putatively linked based on expression with 17 traits of potato, representing both well-known and novel associations. This approach can be used to link the transcriptome to traits. We explore the possibility of associating the level of transcript expression from controlled, optimal environments to traits in a progeny population with different methods introducing the application of DUO for the first time on transcriptome data. We verify the expression pattern for five of the putative transcript markers in another progeny population.


Subject(s)
Life History Traits , Phenotype , Solanum tuberosum/genetics , Transcriptome , Tetraploidy
10.
Front Plant Sci ; 10: 1249, 2019.
Article in English | MEDLINE | ID: mdl-31649710

ABSTRACT

Understanding the regulatory network controlling cell wall biosynthesis is of great interest in Populus trichocarpa, both because of its status as a model woody perennial and its importance for lignocellulosic products. We searched for genes with putatively unknown roles in regulating cell wall biosynthesis using an extended network-based Lines of Evidence (LOE) pipeline to combine multiple omics data sets in P. trichocarpa, including gene coexpression, gene comethylation, population level pairwise SNP correlations, and two distinct SNP-metabolite Genome Wide Association Study (GWAS) layers. By incorporating validation, ranking, and filtering approaches we produced a list of nine high priority gene candidates for involvement in the regulation of cell wall biosynthesis. We subsequently performed a detailed investigation of candidate gene GROWTH-REGULATING FACTOR 9 (PtGRF9). To investigate the role of PtGRF9 in regulating cell wall biosynthesis, we assessed the genome-wide connections of PtGRF9 and a paralog across data layers with functional enrichment analyses, predictive transcription factor binding site analysis, and an independent comparison to eQTN data. Our findings indicate that PtGRF9 likely affects the cell wall by directly repressing genes involved in cell wall biosynthesis, such as PtCCoAOMT and PtMYB.41, and indirectly by regulating homeobox genes. Furthermore, evidence suggests that PtGRF9 paralogs may act as transcriptional co-regulators that direct the global energy usage of the plant. Using our extended pipeline, we show multiple lines of evidence implicating the involvement of these genes in cell wall regulatory functions and demonstrate the value of this method for prioritizing candidate genes for experimental validation.

11.
Front Genet ; 10: 874, 2019.
Article in English | MEDLINE | ID: mdl-31608114

ABSTRACT

Populus trichocarpa is an important biofuel feedstock that has been the target of extensive research and is emerging as a model organism for plants, especially woody perennials. This research has generated several large 'omics datasets. However, only few studies in Populus have attempted to integrate various data types. This review will summarize various 'omics data layers, focusing on their application in Populus species. Subsequently, network and signal processing techniques for the integration and analysis of these data types will be discussed, with particular reference to examples in Populus.

12.
Front Genet ; 10: 487, 2019.
Article in English | MEDLINE | ID: mdl-31214244

ABSTRACT

Various 'omics data types have been generated for Populus trichocarpa, each providing a layer of information which can be represented as a density signal across a chromosome. We make use of genome sequence data, variants data across a population as well as methylation data across 10 different tissues, combined with wavelet-based signal processing to perform a comprehensive analysis of the signature of the centromere in these different data signals, and successfully identify putative centromeric regions in P. trichocarpa from these signals. Furthermore, using SNP (single nucleotide polymorphism) correlations across a natural population of P. trichocarpa, we find evidence for the co-evolution of the centromeric histone CENH3 with the sequence of the newly identified centromeric regions, and identify a new CENH3 candidate in P. trichocarpa.

13.
Front Genet ; 10: 417, 2019.
Article in English | MEDLINE | ID: mdl-31134130

ABSTRACT

Various patterns of multi-phenotype associations (MPAs) exist in the results of Genome Wide Association Studies (GWAS) involving different topologies of single nucleotide polymorphism (SNP)-phenotype associations. These can provide interesting information about the different impacts of a gene on closely related phenotypes or disparate phenotypes (pleiotropy). In this work we present MPA Decomposition, a new network-based approach which decomposes the results of a multi-phenotype GWAS study into three bipartite networks, which, when used together, unravel the multi-phenotype signatures of genes on a genome-wide scale. The decomposition involves the construction of a phenotype powerset space, and subsequent mapping of genes into this new space. Clustering of genes in this powerset space groups genes based on their detailed MPA signatures. We show that this method allows us to find multiple different MPA and pleiotropic signatures within individual genes and to classify and cluster genes based on these SNP-phenotype association topologies. We demonstrate the use of this approach on a GWAS analysis of a large population of 882 Populus trichocarpa genotypes using untargeted metabolomics phenotypes. This method should prove invaluable in the interpretation of large GWAS datasets and aid in future synthetic biology efforts designed to optimize phenotypes of interest.

14.
Front Microbiol ; 10: 481, 2019.
Article in English | MEDLINE | ID: mdl-30984119

ABSTRACT

Plant root-associated microbial symbionts comprise the plant rhizobiome. These microbes function in provisioning nutrients and water to their hosts, impacting plant health and disease. The plant microbiome is shaped by plant species, plant genotype, soil and environmental conditions, but the contributions of these variables are hard to disentangle from each other in natural systems. We used bioassay common garden experiments to decouple plant genotype and soil property impacts on fungal and bacterial community structure in the Populus rhizobiome. High throughput amplification and sequencing of 16S, ITS, 28S and 18S rDNA was accomplished through 454 pyrosequencing. Co-association patterns of fungal and bacterial taxa were assessed with 16S and ITS datasets. Community bipartite fungal-bacterial networks and PERMANOVA results attribute significant difference in fungal or bacterial communities to soil origin, soil chemical properties and plant genotype. Indicator species analysis identified a common set of root bacteria as well as endophytic and ectomycorrhizal fungi associated with Populus in different soils. However, no single taxon, or consortium of microbes, was indicative of a particular Populus genotype. Fungal-bacterial networks were over-represented in arbuscular mycorrhizal, endophytic, and ectomycorrhizal fungi, as well as bacteria belonging to the orders Rhizobiales, Chitinophagales, Cytophagales, and Burkholderiales. These results demonstrate the importance of soil and plant genotype on fungal-bacterial networks in the belowground plant microbiome.

15.
PLoS One ; 13(8): e0202519, 2018.
Article in English | MEDLINE | ID: mdl-30118526

ABSTRACT

A characteristic feature of plant cells is the ability to form callus from parenchyma cells in response to biotic and abiotic stimuli. Tissue culture propagation of recalcitrant plant species and genetic engineering for desired phenotypes typically depends on efficient in vitro callus generation. Callus formation is under genetic regulation, and consequently, a molecular understanding of this process underlies successful generation for propagation materials and/or introduction of genetic elements in experimental or industrial applications. Herein, we identified 11 genetic loci significantly associated with callus formation in Populus trichocarpa using a genome-wide association study (GWAS) approach. Eight of the 11 significant gene associations were consistent across biological replications, exceeding a chromosome-wide-log10 (p) = 4.46 [p = 3.47E-05] Bonferroni-adjusted significance threshold. These eight genes were used as hub genes in a high-resolution co-expression network analysis to gain insight into the genome-wide basis of callus formation. A network of positively and negatively co-expressed genes, including several transcription factors, was identified. As proof-of-principle, a transient protoplast assay confirmed the negative regulation of a Chloroplast Nucleoid DNA-binding-related gene (Potri.018G014800) by the LEC2 transcription factor. Many of the candidate genes and co-expressed genes were 1) linked to cell division and cell cycling in plants and 2) showed homology to tumor and cancer-related genes in humans. The GWAS approach based on a high-resolution marker set, and the ability to manipulate targets genes in vitro, provided a catalog of high-confidence genes linked to callus formation that can serve as an important resource for successful manipulation of model and non-model plant species, and likewise, suggests a robust method of discovering common homologous functions across organisms.


Subject(s)
Bony Callus/growth & development , Populus/genetics , Transcription Factors/genetics , Gene Expression Regulation, Plant , Genome-Wide Association Study , Phenotype , Populus/growth & development
16.
G3 (Bethesda) ; 8(8): 2631-2641, 2018 07 31.
Article in English | MEDLINE | ID: mdl-29884614

ABSTRACT

Polyglutamine (polyQ) stretches have been reported to occur in proteins across many organisms including animals, fungi and plants. Expansion of these repeats has attracted much attention due their associations with numerous human diseases including Huntington's and other neurological maladies. This suggests that the relative length of polyQ stretches is an important modulator of their function. Here, we report the identification of a Populus C-terminus binding protein (CtBP) ANGUSTIFOLIA (PtAN1) which contains a polyQ stretch whose functional relevance had not been established. Analysis of 917 resequenced Populus trichocarpa genotypes revealed three allelic variants at this locus encoding 11-, 13- and 15-glutamine residues. Transient expression assays using Populus leaf mesophyll protoplasts revealed that the 11Q variant exhibited strong nuclear localization whereas the 15Q variant was only found in the cytosol, with the 13Q variant exhibiting localization in both subcellular compartments. We assessed functional implications by evaluating expression changes of putative PtAN1 targets in response to overexpression of the three allelic variants and observed allele-specific differences in expression levels of putative targets. Our results provide evidence that variation in polyQ length modulates PtAN1 function by altering subcellular localization.


Subject(s)
Cell Nucleus/metabolism , DNA-Binding Proteins/metabolism , Peptides/chemistry , Plant Proteins/metabolism , Populus/genetics , Active Transport, Cell Nucleus , Alleles , DNA-Binding Proteins/chemistry , Plant Proteins/chemistry , Populus/metabolism , Protein Sorting Signals
17.
Appl Environ Microbiol ; 84(9)2018 05 01.
Article in English | MEDLINE | ID: mdl-29475869

ABSTRACT

Metagenomic data from Obsidian Pool (Yellowstone National Park, USA) and 13 genome sequences were used to reassess genus-wide biodiversity for the extremely thermophilic Caldicellulosiruptor The updated core genome contains 1,401 ortholog groups (average genome size for 13 species = 2,516 genes). The pangenome, which remains open with a revised total of 3,493 ortholog groups, encodes a variety of multidomain glycoside hydrolases (GHs). These include three cellulases with GH48 domains that are colocated in the glucan degradation locus (GDL) and are specific determinants for microcrystalline cellulose utilization. Three recently sequenced species, Caldicellulosiruptor sp. strain Rt8.B8 (renamed here Caldicellulosiruptor morganii), Thermoanaerobacter cellulolyticus strain NA10 (renamed here Caldicellulosiruptor naganoensis), and Caldicellulosiruptor sp. strain Wai35.B1 (renamed here Caldicellulosiruptor danielii), degraded Avicel and lignocellulose (switchgrass). C. morganii was more efficient than Caldicellulosiruptor bescii in this regard and differed from the other 12 species examined, both based on genome content and organization and in the specific domain features of conserved GHs. Metagenomic analysis of lignocellulose-enriched samples from Obsidian Pool revealed limited new information on genus biodiversity. Enrichments yielded genomic signatures closely related to that of Caldicellulosiruptor obsidiansis, but there was also evidence for other thermophilic fermentative anaerobes (Caldanaerobacter, Fervidobacterium, Caloramator, and Clostridium). One enrichment, containing 89.8% Caldicellulosiruptor and 9.7% Caloramator, had a capacity for switchgrass solubilization comparable to that of C. bescii These results refine the known biodiversity of Caldicellulosiruptor and indicate that microcrystalline cellulose degradation at temperatures above 70°C, based on current information, is limited to certain members of this genus that produce GH48 domain-containing enzymes.IMPORTANCE The genus Caldicellulosiruptor contains the most thermophilic bacteria capable of lignocellulose deconstruction, which are promising candidates for consolidated bioprocessing for the production of biofuels and bio-based chemicals. The focus here is on the extant capability of this genus for plant biomass degradation and the extent to which this can be inferred from the core and pangenomes, based on analysis of 13 species and metagenomic sequence information from environmental samples. Key to microcrystalline hydrolysis is the content of the glucan degradation locus (GDL), a set of genes encoding glycoside hydrolases (GHs), several of which have GH48 and family 3 carbohydrate binding module domains, that function as primary cellulases. Resolving the relationship between the GDL and lignocellulose degradation will inform efforts to identify more prolific members of the genus and to develop metabolic engineering strategies to improve this characteristic.


Subject(s)
Firmicutes/genetics , Firmicutes/metabolism , Genome, Bacterial , Lignin/metabolism , Metagenome , Cellulose/metabolism , Firmicutes/classification , Genomics , Metagenomics
18.
Nat Commun ; 8(1): 1899, 2017 12 01.
Article in English | MEDLINE | ID: mdl-29196618

ABSTRACT

Crassulacean acid metabolism (CAM) is a water-use efficient adaptation of photosynthesis that has evolved independently many times in diverse lineages of flowering plants. We hypothesize that convergent evolution of protein sequence and temporal gene expression underpins the independent emergences of CAM from C3 photosynthesis. To test this hypothesis, we generate a de novo genome assembly and genome-wide transcript expression data for Kalanchoë fedtschenkoi, an obligate CAM species within the core eudicots with a relatively small genome (~260 Mb). Our comparative analyses identify signatures of convergence in protein sequence and re-scheduling of diel transcript expression of genes involved in nocturnal CO2 fixation, stomatal movement, heat tolerance, circadian clock, and carbohydrate metabolism in K. fedtschenkoi and other CAM species in comparison with non-CAM species. These findings provide new insights into molecular convergence and building blocks of CAM and will facilitate CAM-into-C3 photosynthesis engineering to enhance water-use efficiency in crops.


Subject(s)
Acids/metabolism , Evolution, Molecular , Genome, Plant , Kalanchoe/genetics , Carbon Dioxide/metabolism , Gene Duplication , Kalanchoe/classification , Kalanchoe/metabolism , Photosynthesis , Phylogeny , Plants/classification , Plants/genetics , Plants/metabolism , Water/metabolism
19.
Adv Biochem Eng Biotechnol ; 160: 143-183, 2017.
Article in English | MEDLINE | ID: mdl-28070594

ABSTRACT

We explore the use of a network meta-modeling approach to compare the effects of similarity metrics used to construct biological networks on the topology of the resulting networks. This work reviews various similarity metrics for the construction of networks and various topology measures for the characterization of resulting network topology, demonstrating the use of these metrics in the construction and comparison of phylogenomic and transcriptomic networks.


Subject(s)
Metabolic Networks and Pathways/physiology , Models, Biological , Signal Transduction/physiology , Transcription Factors/metabolism , Transcriptome/physiology , Algorithms
20.
Nat Plants ; 2: 16178, 2016 11 21.
Article in English | MEDLINE | ID: mdl-27869799

ABSTRACT

Already a proven mechanism for drought resilience, crassulacean acid metabolism (CAM) is a specialized type of photosynthesis that maximizes water-use efficiency by means of an inverse (compared to C3 and C4 photosynthesis) day/night pattern of stomatal closure/opening to shift CO2 uptake to the night, when evapotranspiration rates are low. A systems-level understanding of temporal molecular and metabolic controls is needed to define the cellular behaviour underpinning CAM. Here, we report high-resolution temporal behaviours of transcript, protein and metabolite abundances across a CAM diel cycle and, where applicable, compare the observations to the well-established C3 model plant Arabidopsis. A mechanistic finding that emerged is that CAM operates with a diel redox poise that is shifted relative to that in Arabidopsis. Moreover, we identify widespread rescheduled expression of genes associated with signal transduction mechanisms that regulate stomatal opening/closing. Controlled production and degradation of transcripts and proteins represents a timing mechanism by which to regulate cellular function, yet knowledge of how this molecular timekeeping regulates CAM is unknown. Here, we provide new insights into complex post-transcriptional and -translational hierarchies that govern CAM in Agave. These data sets provide a resource to inform efforts to engineer more efficient CAM traits into economically valuable C3 crops.


Subject(s)
Agave/genetics , Gene Expression Regulation, Plant , Gene Regulatory Networks , Plant Proteins/genetics , Circadian Rhythm , Darkness , Light , Plant Proteins/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...