RESUMEN
Understanding the impact of regulatory variants on complex phenotypes is a significant challenge because the genes and pathways that are targeted by such variants and the cell type context in which regulatory variants operate are typically unknown. Cell-type-specific long-range regulatory interactions that occur between a distal regulatory sequence and a gene offer a powerful framework for examining the impact of regulatory variants on complex phenotypes. However, high-resolution maps of such long-range interactions are available only for a handful of cell types. Furthermore, identifying specific gene subnetworks or pathways that are targeted by a set of variants is a significant challenge. We have developed L-HiC-Reg, a Random Forests regression method to predict high-resolution contact counts in new cell types, and a network-based framework to identify candidate cell-type-specific gene networks targeted by a set of variants from a genome-wide association study (GWAS). We applied our approach to predict interactions in 55 Roadmap Epigenomics Mapping Consortium cell types, which we used to interpret regulatory single nucleotide polymorphisms (SNPs) in the NHGRI-EBI GWAS catalogue. Using our approach, we performed an in-depth characterization of fifteen different phenotypes including schizophrenia, coronary artery disease (CAD) and Crohn's disease. We found differentially wired subnetworks consisting of known as well as novel gene targets of regulatory SNPs. Taken together, our compendium of interactions and the associated network-based analysis pipeline leverages long-range regulatory interactions to examine the context-specific impact of regulatory variation in complex phenotypes.
Asunto(s)
Epigenoma , Estudio de Asociación del Genoma Completo , Humanos , Estudio de Asociación del Genoma Completo/métodos , Redes Reguladoras de Genes/genética , Genoma , Epigenómica , Polimorfismo de Nucleótido Simple/genética , Predisposición Genética a la EnfermedadRESUMEN
Recent advances in consortium-scale genome-wide association studies (GWAS) have highlighted the involvement of common genetic variants in autism spectrum disorder (ASD), but our understanding of their etiologic roles, especially the interplay with rare variants, is incomplete. In this work, we introduce an analytical framework to quantify the transmission disequilibrium of genetically regulated gene expression from parents to offspring. We applied this framework to conduct a transcriptome-wide association study (TWAS) on 7,805 ASD proband-parent trios, and replicated our findings using 35,740 independent samples. We identified 31 associations at the transcriptome-wide significance level. In particular, we identified POU3F2 (p = 2.1E-7), a transcription factor mainly expressed in developmental brain. Gene targets regulated by POU3F2 showed a 2.7-fold enrichment for known ASD genes (p = 2.0E-5) and a 2.7-fold enrichment for loss-of-function de novo mutations in ASD probands (p = 7.1E-5). These results provide a novel connection between rare and common variants, whereby ASD genes affected by very rare mutations are regulated by an unlinked transcription factor affected by common genetic variations.
Asunto(s)
Trastorno del Espectro Autista/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Hipocampo/metabolismo , Proteínas de Homeodominio/genética , Factores del Dominio POU/genética , Transcriptoma/genética , Alelos , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Humanos , Mutación , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Factores de Riesgo , Análisis Espacio-TemporalRESUMEN
RNA-binding proteins (RNA-BPs) play critical roles in development and disease to regulate gene expression. However, genome-wide identification of their targets in primary human cells has been challenging. Here, we applied a modified CLIP-seq strategy to identify genome-wide targets of the FMRP translational regulator 1 (FMR1), a brain-enriched RNA-BP, whose deficiency leads to Fragile X Syndrome (FXS), the most prevalent inherited intellectual disability. We identified FMR1 targets in human dorsal and ventral forebrain neural progenitors and excitatory and inhibitory neurons differentiated from human pluripotent stem cells. In parallel, we measured the transcriptomes of the same four cell types upon FMR1 gene deletion. We discovered that FMR1 preferentially binds long transcripts in human neural cells. FMR1 targets include genes unique to human neural cells and associated with clinical phenotypes of FXS and autism. Integrative network analysis using graph diffusion and multitask clustering of FMR1 CLIP-seq and transcriptional targets reveals critical pathways regulated by FMR1 in human neural development. Our results demonstrate that FMR1 regulates a common set of targets among different neural cell types but also operates in a cell type-specific manner targeting distinct sets of genes in human excitatory and inhibitory neural progenitors and neurons. By defining molecular subnetworks and validating specific high-priority genes, we identify novel components of the FMR1 regulation program. Our results provide new insights into gene regulation by a critical neuronal RNA-BP in human neurodevelopment.
Asunto(s)
Proteína de la Discapacidad Intelectual del Síndrome del Cromosoma X Frágil/metabolismo , Células-Madre Neurales/metabolismo , Neuronas/metabolismo , Trastorno Autístico/genética , Línea Celular , Secuenciación de Inmunoprecipitación de Cromatina , Proteína de la Discapacidad Intelectual del Síndrome del Cromosoma X Frágil/genética , Síndrome del Cromosoma X Frágil/genética , Eliminación de Gen , Redes Reguladoras de Genes , Humanos , Masculino , Células-Madre Neurales/citología , Neurogénesis , Células Madre Pluripotentes/citología , Prosencéfalo/citología , Prosencéfalo/metabolismo , TranscriptomaRESUMEN
Comparative functional genomics offers a powerful approach to study species evolution. To date, the majority of these studies have focused on the transcriptome in mammalian and yeast phylogenies. Here, we present a novel multi-species proteomic dataset and a computational pipeline to systematically compare the protein levels across multiple plant species. Globally we find that protein levels diverge according to phylogenetic distance but is more constrained than the mRNA level. Module-level comparative analysis of groups of proteins shows that proteins that are more highly expressed tend to be more conserved. To interpret the evolutionary patterns of conservation and divergence, we develop a novel network-based integrative analysis pipeline that combines publicly available transcriptomic datasets to define co-expression modules. Our analysis pipeline can be used to relate the changes in protein levels to different species-specific phenotypic traits. We present a case study with the rhizobia-legume symbiosis process that supports the role of autophagy in this symbiotic association.
Asunto(s)
Biología Computacional/métodos , Redes Reguladoras de Genes , Proteínas de Plantas/metabolismo , Plantas/metabolismo , Proteoma/metabolismo , Proteómica/métodos , Cromatografía Liquida/métodos , Evolución Molecular , Regulación de la Expresión Génica de las Plantas , Ontología de Genes , Genómica/métodos , Filogenia , Proteínas de Plantas/genética , Plantas/clasificación , Plantas/genética , Proteoma/genética , Especificidad de la Especie , Espectrometría de Masas en Tándem/métodos , Transcriptoma/genéticaRESUMEN
Protein domains are basic functional units of proteins. Many protein domains are pervasive among diverse biological processes, yet some are associated with specific pathways. Human complex diseases are generally viewed as pathway-level disorders. Therefore, we hypothesized that pathway-specific domains could be highly informative for human diseases. To test the hypothesis, we developed a network-based scoring scheme to quantify specificity of domain-pathway associations. We first generated domain profiles for human proteins, then constructed a co-pathway protein network based on the associations between domain profiles. Based on the score, we classified human protein domains into pathway-specific domains (PSDs) and non-specific domains (NSDs). We found that PSDs contained more pathogenic variants than NSDs. PSDs were also enriched for disease-associated mutations that disrupt protein-protein interactions (PPIs) and tend to have a moderate number of domain interactions. These results suggest that mutations in PSDs are likely to disrupt within-pathway PPIs, resulting in functional failure of pathways. Finally, we demonstrated the prediction capacity of PSDs for disease-associated genes with experimental validations in zebrafish. Taken together, the network-based quantitative method of modeling domain-pathway associations presented herein suggested underlying mechanisms of how protein domains associated with specific pathways influence mutational impacts on diseases via perturbations in within-pathway PPIs, and provided a novel genomic feature for interpreting genetic variants to facilitate the discovery of human disease genes.
Asunto(s)
Enfermedad/etiología , Dominios Proteicos , Mapas de Interacción de Proteínas , Animales , Animales Modificados Genéticamente , Biología Computacional , Enfermedad de la Arteria Coronaria/etiología , Enfermedad de la Arteria Coronaria/genética , Enfermedad de la Arteria Coronaria/metabolismo , Enfermedad/genética , Predisposición Genética a la Enfermedad , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos , Modelos Animales , Modelos Biológicos , Mutación , Polimorfismo de Nucleótido Simple , Dominios Proteicos/genética , Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas/genética , Pez Cebra/genéticaRESUMEN
Arabidopsis thaliana is a reference plant that has been studied intensively for several decades. Recent advances in high-throughput experimental technology have enabled the generation of an unprecedented amount of data from A. thaliana, which has facilitated data-driven approaches to unravel the genetic organization of plant phenotypes. We previously published a description of a genome-scale functional gene network for A. thaliana, AraNet, which was constructed by integrating multiple co-functional gene networks inferred from diverse data types, and we demonstrated the predictive power of this network for complex phenotypes. More recently, we have observed significant growth in the availability of omics data for A. thaliana as well as improvements in data analysis methods that we anticipate will further enhance the integrated database of co-functional networks. Here, we present an updated co-functional gene network for A. thaliana, AraNet v2 (available at http://www.inetbio.org/aranet), which covers approximately 84% of the coding genome. We demonstrate significant improvements in both genome coverage and accuracy. To enhance the usability of the network, we implemented an AraNet v2 web server, which generates functional predictions for A. thaliana and 27 nonmodel plant species using an orthology-based projection of nonmodel plant genes on the A. thaliana gene network.
Asunto(s)
Arabidopsis/genética , Bases de Datos Genéticas , Regulación de la Expresión Génica de las Plantas , Redes Reguladoras de Genes , Arabidopsis/metabolismo , Genoma de Planta , Internet , FenotipoRESUMEN
Rice is the most important staple food crop and a model grass for studies of bioenergy crops. We previously published a genome-scale functional network server called RiceNet, constructed by integrating diverse genomics data and demonstrated the use of the network in genetic dissection of rice biotic stress responses and its usefulness for other grass species. Since the initial construction of the network, there has been a significant increase in the amount of publicly available rice genomics data. Here, we present an updated network prioritization server for Oryza sativa ssp. japonica, RiceNet v2 (http://www.inetbio.org/ricenet), which provides a network of 25 765 genes (70.1% of the coding genome) and 1 775 000 co-functional links. Ricenet v2 also provides two complementary methods for network prioritization based on: (i) network direct neighborhood and (ii) context-associated hubs. RiceNet v2 can use genes of the related subspecies O. sativa ssp. indica and the reference plant Arabidopsis for versatility in generating hypotheses. We demonstrate that RiceNet v2 effectively identifies candidate genes involved in rice root/shoot development and defense responses, demonstrating its usefulness for the grass research community.
Asunto(s)
Genes de Plantas , Oryza/genética , Programas Informáticos , Arabidopsis/genética , Regulación de la Expresión Génica de las Plantas , Redes Reguladoras de Genes , InternetRESUMEN
Drosophila melanogaster (fruit fly) has been a popular model organism in animal genetics due to the high accessibility of reverse-genetics tools. In addition, the close relationship between the Drosophila and human genomes rationalizes the use of Drosophila as an invertebrate model for human neurobiology and disease research. A platform technology for predicting candidate genes or functions would further enhance the usefulness of this long-established model organism for gene-to-phenotype mapping. Recently, the power of network prioritization for gene-to-phenotype mapping has been demonstrated in many organisms. Here we present a network prioritization server dedicated to Drosophila that covers â¼95% of the coding genome. This server, dubbed FlyNet, has several distinctive features, including (i) prioritization for both genes and functions; (ii) two complementary network algorithms: direct neighborhood and network diffusion; (iii) spatiotemporal-specific networks as an additional prioritization strategy for traits associated with a specific developmental stage or tissue and (iv) prioritization for human disease genes. FlyNet is expected to serve as a versatile hypothesis-generation platform for genes and functions in the study of basic animal genetics, developmental biology and human disease. FlyNet is available for free at http://www.inetbio.org/flynet.
Asunto(s)
Drosophila melanogaster/genética , Redes Reguladoras de Genes , Programas Informáticos , Algoritmos , Animales , Enfermedad/genética , Modelos Animales de Enfermedad , Genes de Insecto , Humanos , InternetRESUMEN
Saccharomyces cerevisiae, i.e. baker's yeast, is a widely studied model organism in eukaryote genetics because of its simple protocols for genetic manipulation and phenotype profiling. The high abundance of publicly available data that has been generated through diverse 'omics' approaches has led to the use of yeast for many systems biology studies, including large-scale gene network modeling to better understand the molecular basis of the cellular phenotype. We have previously developed a genome-scale gene network for yeast, YeastNet v2, which has been used for various genetics and systems biology studies. Here, we present an updated version, YeastNet v3 (available at http://www.inetbio.org/yeastnet/), that significantly improves the prediction of gene-phenotype associations. The extended genome in YeastNet v3 covers up to 5818 genes (â¼99% of the coding genome) wired by 362 512 functional links. YeastNet v3 provides a new web interface to run the tools for network-guided hypothesis generations. YeastNet v3 also provides edge information for all data-specific networks (â¼2 million functional links) as well as the integrated networks. Therefore, users can construct alternative versions of the integrated network by applying their own data integration algorithm to the same data-specific links.
Asunto(s)
Bases de Datos Genéticas , Regulación Fúngica de la Expresión Génica , Redes Reguladoras de Genes , Saccharomyces cerevisiae/genética , Internet , Fenotipo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismoRESUMEN
High-throughput experimental technologies gradually shift the paradigm of biological research from hypothesis-validation toward hypothesis-generation science. Translating diverse types of large-scale experimental data into testable hypotheses, however, remains a daunting task. We previously demonstrated that heterogeneous genomics data can be integrated into a single genome-scale gene network with high prediction power for ribonucleic acid interference (RNAi) phenotypes in Caenorhabditis elegans, a popular metazoan model in the study of developmental biology, neurobiology and genetics. Here, we present WormNet version 3 (v3), which is a new network-assisted hypothesis-generating server for C. elegans. WormNet v3 includes major updates to the base gene network, which substantially improved predictions of RNAi phenotypes. The server generates various gene network-based hypotheses using three complementary network methods: (i) a phenotype-centric approach to 'find new members for a pathway'; (ii) a gene-centric approach to 'infer functions from network neighbors' and (iii) a context-centric approach to 'find context-associated hub genes', which is a new method to identify key genes that mediate physiology within a specific context. For example, we demonstrated that the context-centric approach can be used to identify potential molecular targets of toxic chemicals. WormNet v3 is freely accessible at http://www.inetbio.org/wormnet.
Asunto(s)
Caenorhabditis elegans/genética , Programas Informáticos , Animales , Caenorhabditis elegans/efectos de los fármacos , Diclorvos/toxicidad , Redes Reguladoras de Genes , Genes de Helminto , Insecticidas/toxicidad , Internet , Fenotipo , Interferencia de ARNRESUMEN
Most phenotypes are genetically complex, with contributions from mutations in many different genes. Mutations in more than one gene can combine synergistically to cause phenotypic change, and systematic studies in model organisms show that these genetic interactions are pervasive. However, in human association studies such nonadditive genetic interactions are very difficult to identify because of a lack of statistical power--simply put, the number of potential interactions is too vast. One approach to resolve this is to predict candidate modifier interactions between loci, and then to specifically test these for associations with the phenotype. Here, we describe a general method for predicting genetic interactions based on the use of integrated functional gene networks. We show that in both Saccharomyces cerevisiae and Caenorhabditis elegans a single high-coverage, high-quality functional network can successfully predict genetic modifiers for the majority of genes. For C. elegans we also describe the construction of a new, improved, and expanded functional network, WormNet 2. Using this network we demonstrate how it is possible to rapidly expand the number of modifier loci known for a gene, predicting and validating new genetic interactions for each of three signal transduction genes. We propose that this approach, termed network-guided modifier screening, provides a general strategy for predicting genetic interactions. This work thus suggests that a high-quality integrated human gene network will provide a powerful resource for modifier locus discovery in many different diseases.
Asunto(s)
Redes Reguladoras de Genes , Sitios Genéticos , Modelos Genéticos , Análisis de Secuencia de ADN/métodos , Animales , Caenorhabditis elegans/genética , Mutación , Saccharomyces cerevisiae/genética , Transducción de Señal/genéticaRESUMEN
Single-cell RNA-sequencing (scRNA-seq) offers unparalleled insight into the transcriptional programs of different cellular states by measuring the transcriptome of thousands of individual cells. An emerging problem in the analysis of scRNA-seq is the inference of transcriptional gene regulatory networks and a number of methods with different learning frameworks have been developed to address this problem. Here, we present an expanded benchmarking study of eleven recent network inference methods on seven published scRNA-seq datasets in human, mouse, and yeast considering different types of gold standard networks and evaluation metrics. We evaluate methods based on their computing requirements as well as on their ability to recover the network structure. We find that, while most methods have a modest recovery of experimentally derived interactions based on global metrics such as Area Under the Precision Recall curve, methods are able to capture targets of regulators that are relevant to the system under study. Among the top performing methods that use only expression were SCENIC, PIDC, MERLIN or Correlation. Addition of prior biological knowledge and the estimation of transcription factor activities resulted in the best overall performance with the Inferelator and MERLIN methods that use prior knowledge outperforming methods that use expression alone. We found that imputation for network inference did not improve network inference accuracy and could be detrimental. Comparisons of inferred networks for comparable bulk conditions showed that the networks inferred from scRNA-seq datasets are often better or at par with the networks inferred from bulk datasets. Our analysis should be beneficial in selecting methods for network inference. At the same time, this highlights the need for improved methods and better gold standards for regulatory network inference from scRNAseq datasets.
Asunto(s)
Algoritmos , Neurofibromina 2 , Humanos , Animales , Ratones , Análisis de Expresión Génica de una Sola Célula , Análisis de la Célula Individual/métodos , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Saccharomyces cerevisiae , Análisis de Secuencia de ARN/métodos , Perfilación de la Expresión GénicaRESUMEN
Our inability to derive the neuronal diversity that comprises the posterior central nervous system (pCNS) using human pluripotent stem cells (hPSCs) poses an impediment to understanding human neurodevelopment and disease in the hindbrain and spinal cord. Here, we establish a modular, monolayer differentiation paradigm that recapitulates both rostrocaudal (R/C) and dorsoventral (D/V) patterning, enabling derivation of diverse pCNS neurons with discrete regional specificity. First, neuromesodermal progenitors (NMPs) with discrete HOX profiles are converted to pCNS progenitors (pCNSPs). Then, by tuning D/V signaling, pCNSPs are directed to locomotor or somatosensory neurons. Expansive single-cell RNA-sequencing (scRNA-seq) analysis coupled with a novel computational pipeline allowed us to detect hundreds of transcriptional markers within region-specific phenotypes, enabling discovery of gene expression patterns across R/C and D/V developmental axes. These findings highlight the potential of these resources to advance a mechanistic understanding of pCNS development, enhance in vitro models, and inform therapeutic strategies.
Asunto(s)
Neuronas , Transcriptoma , Diferenciación Celular/genética , Sistema Nervioso Central , Humanos , Neuronas/fisiología , ARNRESUMEN
The inhibitors of apoptosis proteins (IAP), which include cIAP1, cIAP2 and XIAP, suppress apoptosis through the inhibition of caspases, and the activity of IAPs is regulated by a variety of IAP-binding proteins. Herein, we report the identification of a Vestigial-like 4 (Vgl-4), which functions as a transcription cofactor in cardiac myocytes, as a new IAP binding protein. Vgl-4 is expressed predominantly in the nucleus and its overexpression triggers a relocalization of IAPs from the cytoplasm to the nucleus. cIAP1/2-interacting protein TRAF2 (TNF receptor-associated factor 2) prevented the Vgl-4-driven nuclear localization of cIAP2. Accordingly, the forced relocation of IAPs to the nucleus by Vgl-4 significantly reduced their ability to prevent Bax- and TNFα-induced apoptosis, which can be recovered by co-expression with TRAF2. Our results suggest that Vgl-4 may play a role in the apoptotic pathways by regulating translocation of IAPs between different cell compartments.
Asunto(s)
Apoptosis , Núcleo Celular/metabolismo , Proteínas Inhibidoras de la Apoptosis/metabolismo , Factores de Transcripción/metabolismo , Transporte Activo de Núcleo Celular , Células HEK293 , Células HeLa , Humanos , Factor 2 Asociado a Receptor de TNF/metabolismo , Proteína Inhibidora de la Apoptosis Ligada a XRESUMEN
Transcriptional regulatory networks control context-specific gene expression patterns and play important roles in normal and disease processes. Advances in genomics are rapidly increasing our ability to measure different components of the regulation machinery at the single-cell and bulk population level. An important challenge is to combine different types of regulatory genomic measurements to construct a more complete picture of gene regulatory networks across different disease, environmental, and developmental contexts. In this review, we focus on recent computational methods that integrate regulatory genomic data sets to infer context specificity and dynamics in regulatory networks.
RESUMEN
Functional constraints between genes display similar patterns of gain or loss during speciation. Similar phylogenetic profiles, therefore, can be an indication of a functional association between genes. The phylogenetic profiling method has been applied successfully to the reconstruction of gene pathways and the inference of unknown gene functions. This method requires only sequence data to generate phylogenetic profiles. This method therefore has the potential to take advantage of the recent explosion in available sequence data to reveal a significant number of functional associations between genes. Since the initial development of phylogenetic profiling, many modifications to improve this method have been proposed, including improvements in the measurement of profile similarity and the selection of reference species. Here, we describe the existing methods of phylogenetic profiling for the inference of functional associations and discuss their technical limitations and caveats.
Asunto(s)
Biología Computacional/métodos , Redes Reguladoras de Genes/genética , Perfilación de la Expresión Génica , FilogeniaRESUMEN
Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life-Archaea, Bacteria, and Eukaryota-suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co-inheritance analysis within the domains of life will greatly potentiate the use of the expected onslaught of sequenced genomes in the study of molecular pathways in higher eukaryotes.
Asunto(s)
Patrón de Herencia , Filogenia , Secuencia de Aminoácidos , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Componente Principal , Especificidad de la EspecieRESUMEN
During the past several decades, Escherichia coli has been a treasure chest for molecular biology. The molecular mechanisms of many fundamental cellular processes have been discovered through research on this bacterium. Although much basic research now focuses on more complex model organisms, E. coli still remains important in metabolic engineering and synthetic biology. Despite its long history as a subject of molecular investigation, more than one-third of the E. coli genome has no pathway annotation supported by either experimental evidence or manual curation. Recently, a network-assisted genetics approach to the efficient identification of novel gene functions has increased in popularity. To accelerate the speed of pathway annotation for the remaining uncharacterized part of the E. coli genome, we have constructed a database of cofunctional gene network with near-complete genome coverage of the organism, dubbed EcoliNet. We find that EcoliNet is highly predictive for diverse bacterial phenotypes, including antibiotic response, indicating that it will be useful in prioritizing novel candidate genes for a wide spectrum of bacterial phenotypes. We have implemented a web server where biologists can easily run network algorithms over EcoliNet to predict novel genes involved in a pathway or novel functions for a gene. All integrated cofunctional associations can be downloaded, enabling orthology-based reconstruction of gene networks for other bacterial species as well. Database URL: http://www.inetbio.org/ecolinet.
Asunto(s)
Curaduría de Datos , Bases de Datos de Ácidos Nucleicos , Escherichia coli/genética , Ontología de Genes , Redes Reguladoras de Genes , Genoma BacterianoRESUMEN
Cryptococcus neoformans is an opportunistic human pathogenic fungus that causes meningoencephalitis. Due to the increasing global risk of cryptococcosis and the emergence of drug-resistant strains, the development of predictive genetics platforms for the rapid identification of novel genes governing pathogenicity and drug resistance of C. neoformans is imperative. The analysis of functional genomics data and genome-scale mutant libraries may facilitate the genetic dissection of such complex phenotypes but with limited efficiency. Here, we present a genome-scale co-functional network for C. neoformans, CryptoNet, which covers ~81% of the coding genome and provides an efficient intermediary between functional genomics data and reverse-genetics resources for the genetic dissection of C. neoformans phenotypes. CryptoNet is the first genome-scale co-functional network for any fungal pathogen. CryptoNet effectively identified novel genes for pathogenicity and drug resistance using guilt-by-association and context-associated hub algorithms. CryptoNet is also the first genome-scale co-functional network for fungi in the basidiomycota phylum, as Saccharomyces cerevisiae belongs to the ascomycota phylum. CryptoNet may therefore provide insights into pathway evolution between two distinct phyla of the fungal kingdom. The CryptoNet web server (www.inetbio.org/cryptonet) is a public resource that provides an interactive environment of network-assisted predictive genetics for C. neoformans.
Asunto(s)
Antifúngicos/farmacología , Criptococosis/microbiología , Cryptococcus neoformans/efectos de los fármacos , Cryptococcus neoformans/genética , Farmacorresistencia Fúngica , Infecciones Oportunistas/microbiología , Biología Computacional/métodos , Cryptococcus neoformans/patogenicidad , Redes Reguladoras de Genes , Genes Fúngicos , Genoma Fúngico , Genómica/métodos , Humanos , Modelos Teóricos , Fenotipo , Virulencia/genéticaRESUMEN
The reconstruction of transcriptional regulatory networks (TRNs) is a long-standing challenge in human genetics. Numerous computational methods have been developed to infer regulatory interactions between human transcriptional factors (TFs) and target genes from high-throughput data, and their performance evaluation requires gold-standard interactions. Here we present a database of literature-curated human TF-target interactions, TRRUST (transcriptional regulatory relationships unravelled by sentence-based text-mining, http://www.grnpedia.org/trrust), which currently contains 8,015 interactions between 748 TF genes and 1,975 non-TF genes. A sentence-based text-mining approach was employed for efficient manual curation of regulatory interactions from approximately 20 million Medline abstracts. To the best of our knowledge, TRRUST is the largest publicly available database of literature-curated human TF-target interactions to date. TRRUST also has several useful features: i) information about the mode-of-regulation; ii) tests for target modularity of a query TF; iii) tests for TF cooperativity of a query target; iv) inferences about cooperating TFs of a query TF; and v) prioritizing associated pathways and diseases with a query TF. We observed high enrichment of TF-target pairs in TRRUST for top-scored interactions inferred from high-throughput data, which suggests that TRRUST provides a reliable benchmark for the computational reconstruction of human TRNs.