Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
PLoS Comput Biol ; 19(7): e1011286, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37428809

RESUMO

Understanding the impact of regulatory variants on complex phenotypes is a significant challenge because the genes and pathways that are targeted by such variants and the cell type context in which regulatory variants operate are typically unknown. Cell-type-specific long-range regulatory interactions that occur between a distal regulatory sequence and a gene offer a powerful framework for examining the impact of regulatory variants on complex phenotypes. However, high-resolution maps of such long-range interactions are available only for a handful of cell types. Furthermore, identifying specific gene subnetworks or pathways that are targeted by a set of variants is a significant challenge. We have developed L-HiC-Reg, a Random Forests regression method to predict high-resolution contact counts in new cell types, and a network-based framework to identify candidate cell-type-specific gene networks targeted by a set of variants from a genome-wide association study (GWAS). We applied our approach to predict interactions in 55 Roadmap Epigenomics Mapping Consortium cell types, which we used to interpret regulatory single nucleotide polymorphisms (SNPs) in the NHGRI-EBI GWAS catalogue. Using our approach, we performed an in-depth characterization of fifteen different phenotypes including schizophrenia, coronary artery disease (CAD) and Crohn's disease. We found differentially wired subnetworks consisting of known as well as novel gene targets of regulatory SNPs. Taken together, our compendium of interactions and the associated network-based analysis pipeline leverages long-range regulatory interactions to examine the context-specific impact of regulatory variation in complex phenotypes.


Assuntos
Epigenoma , Estudo de Associação Genômica Ampla , Humanos , Estudo de Associação Genômica Ampla/métodos , Redes Reguladoras de Genes/genética , Genoma , Epigenômica , Polimorfismo de Nucleotídeo Único/genética , Predisposição Genética para Doença
2.
PLoS Genet ; 17(2): e1009309, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33539344

RESUMO

Recent advances in consortium-scale genome-wide association studies (GWAS) have highlighted the involvement of common genetic variants in autism spectrum disorder (ASD), but our understanding of their etiologic roles, especially the interplay with rare variants, is incomplete. In this work, we introduce an analytical framework to quantify the transmission disequilibrium of genetically regulated gene expression from parents to offspring. We applied this framework to conduct a transcriptome-wide association study (TWAS) on 7,805 ASD proband-parent trios, and replicated our findings using 35,740 independent samples. We identified 31 associations at the transcriptome-wide significance level. In particular, we identified POU3F2 (p = 2.1E-7), a transcription factor mainly expressed in developmental brain. Gene targets regulated by POU3F2 showed a 2.7-fold enrichment for known ASD genes (p = 2.0E-5) and a 2.7-fold enrichment for loss-of-function de novo mutations in ASD probands (p = 7.1E-5). These results provide a novel connection between rare and common variants, whereby ASD genes affected by very rare mutations are regulated by an unlinked transcription factor affected by common genetic variations.


Assuntos
Transtorno do Espectro Autista/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Hipocampo/metabolismo , Proteínas de Homeodomínio/genética , Fatores do Domínio POU/genética , Transcriptoma/genética , Alelos , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Humanos , Mutação , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Fatores de Risco , Análise Espaço-Temporal
3.
Genome Res ; 30(3): 361-374, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32179589

RESUMO

RNA-binding proteins (RNA-BPs) play critical roles in development and disease to regulate gene expression. However, genome-wide identification of their targets in primary human cells has been challenging. Here, we applied a modified CLIP-seq strategy to identify genome-wide targets of the FMRP translational regulator 1 (FMR1), a brain-enriched RNA-BP, whose deficiency leads to Fragile X Syndrome (FXS), the most prevalent inherited intellectual disability. We identified FMR1 targets in human dorsal and ventral forebrain neural progenitors and excitatory and inhibitory neurons differentiated from human pluripotent stem cells. In parallel, we measured the transcriptomes of the same four cell types upon FMR1 gene deletion. We discovered that FMR1 preferentially binds long transcripts in human neural cells. FMR1 targets include genes unique to human neural cells and associated with clinical phenotypes of FXS and autism. Integrative network analysis using graph diffusion and multitask clustering of FMR1 CLIP-seq and transcriptional targets reveals critical pathways regulated by FMR1 in human neural development. Our results demonstrate that FMR1 regulates a common set of targets among different neural cell types but also operates in a cell type-specific manner targeting distinct sets of genes in human excitatory and inhibitory neural progenitors and neurons. By defining molecular subnetworks and validating specific high-priority genes, we identify novel components of the FMR1 regulation program. Our results provide new insights into gene regulation by a critical neuronal RNA-BP in human neurodevelopment.


Assuntos
Proteína do X Frágil da Deficiência Intelectual/metabolismo , Células-Tronco Neurais/metabolismo , Neurônios/metabolismo , Transtorno Autístico/genética , Linhagem Celular , Sequenciamento de Cromatina por Imunoprecipitação , Proteína do X Frágil da Deficiência Intelectual/genética , Síndrome do Cromossomo X Frágil/genética , Deleção de Genes , Redes Reguladoras de Genes , Humanos , Masculino , Células-Tronco Neurais/citologia , Neurogênese , Células-Tronco Pluripotentes/citologia , Prosencéfalo/citologia , Prosencéfalo/metabolismo , Transcriptoma
4.
Nucleic Acids Res ; 49(1): e3, 2021 01 11.
Artigo em Inglês | MEDLINE | ID: mdl-33219668

RESUMO

Comparative functional genomics offers a powerful approach to study species evolution. To date, the majority of these studies have focused on the transcriptome in mammalian and yeast phylogenies. Here, we present a novel multi-species proteomic dataset and a computational pipeline to systematically compare the protein levels across multiple plant species. Globally we find that protein levels diverge according to phylogenetic distance but is more constrained than the mRNA level. Module-level comparative analysis of groups of proteins shows that proteins that are more highly expressed tend to be more conserved. To interpret the evolutionary patterns of conservation and divergence, we develop a novel network-based integrative analysis pipeline that combines publicly available transcriptomic datasets to define co-expression modules. Our analysis pipeline can be used to relate the changes in protein levels to different species-specific phenotypic traits. We present a case study with the rhizobia-legume symbiosis process that supports the role of autophagy in this symbiotic association.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , Proteínas de Plantas/metabolismo , Plantas/metabolismo , Proteoma/metabolismo , Proteômica/métodos , Cromatografia Líquida/métodos , Evolução Molecular , Regulação da Expressão Gênica de Plantas , Ontologia Genética , Genômica/métodos , Filogenia , Proteínas de Plantas/genética , Plantas/classificação , Plantas/genética , Proteoma/genética , Especificidade da Espécie , Espectrometria de Massas em Tandem/métodos , Transcriptoma/genética
5.
PLoS Comput Biol ; 15(5): e1007052, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-31075101

RESUMO

Protein domains are basic functional units of proteins. Many protein domains are pervasive among diverse biological processes, yet some are associated with specific pathways. Human complex diseases are generally viewed as pathway-level disorders. Therefore, we hypothesized that pathway-specific domains could be highly informative for human diseases. To test the hypothesis, we developed a network-based scoring scheme to quantify specificity of domain-pathway associations. We first generated domain profiles for human proteins, then constructed a co-pathway protein network based on the associations between domain profiles. Based on the score, we classified human protein domains into pathway-specific domains (PSDs) and non-specific domains (NSDs). We found that PSDs contained more pathogenic variants than NSDs. PSDs were also enriched for disease-associated mutations that disrupt protein-protein interactions (PPIs) and tend to have a moderate number of domain interactions. These results suggest that mutations in PSDs are likely to disrupt within-pathway PPIs, resulting in functional failure of pathways. Finally, we demonstrated the prediction capacity of PSDs for disease-associated genes with experimental validations in zebrafish. Taken together, the network-based quantitative method of modeling domain-pathway associations presented herein suggested underlying mechanisms of how protein domains associated with specific pathways influence mutational impacts on diseases via perturbations in within-pathway PPIs, and provided a novel genomic feature for interpreting genetic variants to facilitate the discovery of human disease genes.


Assuntos
Doença/etiologia , Domínios Proteicos , Mapas de Interação de Proteínas , Animais , Animais Geneticamente Modificados , Biologia Computacional , Doença da Artéria Coronariana/etiologia , Doença da Artéria Coronariana/genética , Doença da Artéria Coronariana/metabolismo , Doença/genética , Predisposição Genética para Doença , Variação Genética , Estudo de Associação Genômica Ampla , Humanos , Modelos Animais , Modelos Biológicos , Mutação , Polimorfismo de Nucleotídeo Único , Domínios Proteicos/genética , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas/genética , Peixe-Zebra/genética
6.
Nucleic Acids Res ; 43(Database issue): D996-1002, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25355510

RESUMO

Arabidopsis thaliana is a reference plant that has been studied intensively for several decades. Recent advances in high-throughput experimental technology have enabled the generation of an unprecedented amount of data from A. thaliana, which has facilitated data-driven approaches to unravel the genetic organization of plant phenotypes. We previously published a description of a genome-scale functional gene network for A. thaliana, AraNet, which was constructed by integrating multiple co-functional gene networks inferred from diverse data types, and we demonstrated the predictive power of this network for complex phenotypes. More recently, we have observed significant growth in the availability of omics data for A. thaliana as well as improvements in data analysis methods that we anticipate will further enhance the integrated database of co-functional networks. Here, we present an updated co-functional gene network for A. thaliana, AraNet v2 (available at http://www.inetbio.org/aranet), which covers approximately 84% of the coding genome. We demonstrate significant improvements in both genome coverage and accuracy. To enhance the usability of the network, we implemented an AraNet v2 web server, which generates functional predictions for A. thaliana and 27 nonmodel plant species using an orthology-based projection of nonmodel plant genes on the A. thaliana gene network.


Assuntos
Arabidopsis/genética , Bases de Dados Genéticas , Regulação da Expressão Gênica de Plantas , Redes Reguladoras de Genes , Arabidopsis/metabolismo , Genoma de Planta , Internet , Fenótipo
7.
Nucleic Acids Res ; 43(W1): W91-7, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-25943544

RESUMO

Drosophila melanogaster (fruit fly) has been a popular model organism in animal genetics due to the high accessibility of reverse-genetics tools. In addition, the close relationship between the Drosophila and human genomes rationalizes the use of Drosophila as an invertebrate model for human neurobiology and disease research. A platform technology for predicting candidate genes or functions would further enhance the usefulness of this long-established model organism for gene-to-phenotype mapping. Recently, the power of network prioritization for gene-to-phenotype mapping has been demonstrated in many organisms. Here we present a network prioritization server dedicated to Drosophila that covers ∼95% of the coding genome. This server, dubbed FlyNet, has several distinctive features, including (i) prioritization for both genes and functions; (ii) two complementary network algorithms: direct neighborhood and network diffusion; (iii) spatiotemporal-specific networks as an additional prioritization strategy for traits associated with a specific developmental stage or tissue and (iv) prioritization for human disease genes. FlyNet is expected to serve as a versatile hypothesis-generation platform for genes and functions in the study of basic animal genetics, developmental biology and human disease. FlyNet is available for free at http://www.inetbio.org/flynet.


Assuntos
Drosophila melanogaster/genética , Redes Reguladoras de Genes , Software , Algoritmos , Animais , Doença/genética , Modelos Animais de Doenças , Genes de Insetos , Humanos , Internet
8.
Nucleic Acids Res ; 43(W1): W122-7, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-25813048

RESUMO

Rice is the most important staple food crop and a model grass for studies of bioenergy crops. We previously published a genome-scale functional network server called RiceNet, constructed by integrating diverse genomics data and demonstrated the use of the network in genetic dissection of rice biotic stress responses and its usefulness for other grass species. Since the initial construction of the network, there has been a significant increase in the amount of publicly available rice genomics data. Here, we present an updated network prioritization server for Oryza sativa ssp. japonica, RiceNet v2 (http://www.inetbio.org/ricenet), which provides a network of 25 765 genes (70.1% of the coding genome) and 1 775 000 co-functional links. Ricenet v2 also provides two complementary methods for network prioritization based on: (i) network direct neighborhood and (ii) context-associated hubs. RiceNet v2 can use genes of the related subspecies O. sativa ssp. indica and the reference plant Arabidopsis for versatility in generating hypotheses. We demonstrate that RiceNet v2 effectively identifies candidate genes involved in rice root/shoot development and defense responses, demonstrating its usefulness for the grass research community.


Assuntos
Genes de Plantas , Oryza/genética , Software , Arabidopsis/genética , Regulação da Expressão Gênica de Plantas , Redes Reguladoras de Genes , Internet
9.
Nucleic Acids Res ; 42(Database issue): D731-6, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24165882

RESUMO

Saccharomyces cerevisiae, i.e. baker's yeast, is a widely studied model organism in eukaryote genetics because of its simple protocols for genetic manipulation and phenotype profiling. The high abundance of publicly available data that has been generated through diverse 'omics' approaches has led to the use of yeast for many systems biology studies, including large-scale gene network modeling to better understand the molecular basis of the cellular phenotype. We have previously developed a genome-scale gene network for yeast, YeastNet v2, which has been used for various genetics and systems biology studies. Here, we present an updated version, YeastNet v3 (available at http://www.inetbio.org/yeastnet/), that significantly improves the prediction of gene-phenotype associations. The extended genome in YeastNet v3 covers up to 5818 genes (∼99% of the coding genome) wired by 362 512 functional links. YeastNet v3 provides a new web interface to run the tools for network-guided hypothesis generations. YeastNet v3 also provides edge information for all data-specific networks (∼2 million functional links) as well as the integrated networks. Therefore, users can construct alternative versions of the integrated network by applying their own data integration algorithm to the same data-specific links.


Assuntos
Bases de Dados Genéticas , Regulação Fúngica da Expressão Gênica , Redes Reguladoras de Genes , Saccharomyces cerevisiae/genética , Internet , Fenótipo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo
10.
Nucleic Acids Res ; 42(Web Server issue): W76-82, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24813450

RESUMO

High-throughput experimental technologies gradually shift the paradigm of biological research from hypothesis-validation toward hypothesis-generation science. Translating diverse types of large-scale experimental data into testable hypotheses, however, remains a daunting task. We previously demonstrated that heterogeneous genomics data can be integrated into a single genome-scale gene network with high prediction power for ribonucleic acid interference (RNAi) phenotypes in Caenorhabditis elegans, a popular metazoan model in the study of developmental biology, neurobiology and genetics. Here, we present WormNet version 3 (v3), which is a new network-assisted hypothesis-generating server for C. elegans. WormNet v3 includes major updates to the base gene network, which substantially improved predictions of RNAi phenotypes. The server generates various gene network-based hypotheses using three complementary network methods: (i) a phenotype-centric approach to 'find new members for a pathway'; (ii) a gene-centric approach to 'infer functions from network neighbors' and (iii) a context-centric approach to 'find context-associated hub genes', which is a new method to identify key genes that mediate physiology within a specific context. For example, we demonstrated that the context-centric approach can be used to identify potential molecular targets of toxic chemicals. WormNet v3 is freely accessible at http://www.inetbio.org/wormnet.


Assuntos
Caenorhabditis elegans/genética , Software , Animais , Caenorhabditis elegans/efeitos dos fármacos , Diclorvós/toxicidade , Redes Reguladoras de Genes , Genes de Helmintos , Inseticidas/toxicidade , Internet , Fenótipo , Interferência de RNA
11.
Genome Res ; 20(8): 1143-53, 2010 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-20538624

RESUMO

Most phenotypes are genetically complex, with contributions from mutations in many different genes. Mutations in more than one gene can combine synergistically to cause phenotypic change, and systematic studies in model organisms show that these genetic interactions are pervasive. However, in human association studies such nonadditive genetic interactions are very difficult to identify because of a lack of statistical power--simply put, the number of potential interactions is too vast. One approach to resolve this is to predict candidate modifier interactions between loci, and then to specifically test these for associations with the phenotype. Here, we describe a general method for predicting genetic interactions based on the use of integrated functional gene networks. We show that in both Saccharomyces cerevisiae and Caenorhabditis elegans a single high-coverage, high-quality functional network can successfully predict genetic modifiers for the majority of genes. For C. elegans we also describe the construction of a new, improved, and expanded functional network, WormNet 2. Using this network we demonstrate how it is possible to rapidly expand the number of modifier loci known for a gene, predicting and validating new genetic interactions for each of three signal transduction genes. We propose that this approach, termed network-guided modifier screening, provides a general strategy for predicting genetic interactions. This work thus suggests that a high-quality integrated human gene network will provide a powerful resource for modifier locus discovery in many different diseases.


Assuntos
Redes Reguladoras de Genes , Loci Gênicos , Modelos Genéticos , Análise de Sequência de DNA/métodos , Animais , Caenorhabditis elegans/genética , Mutação , Saccharomyces cerevisiae/genética , Transdução de Sinais/genética
12.
G3 (Bethesda) ; 13(3)2023 03 09.
Artigo em Inglês | MEDLINE | ID: mdl-36626328

RESUMO

Single-cell RNA-sequencing (scRNA-seq) offers unparalleled insight into the transcriptional programs of different cellular states by measuring the transcriptome of thousands of individual cells. An emerging problem in the analysis of scRNA-seq is the inference of transcriptional gene regulatory networks and a number of methods with different learning frameworks have been developed to address this problem. Here, we present an expanded benchmarking study of eleven recent network inference methods on seven published scRNA-seq datasets in human, mouse, and yeast considering different types of gold standard networks and evaluation metrics. We evaluate methods based on their computing requirements as well as on their ability to recover the network structure. We find that, while most methods have a modest recovery of experimentally derived interactions based on global metrics such as Area Under the Precision Recall curve, methods are able to capture targets of regulators that are relevant to the system under study. Among the top performing methods that use only expression were SCENIC, PIDC, MERLIN or Correlation. Addition of prior biological knowledge and the estimation of transcription factor activities resulted in the best overall performance with the Inferelator and MERLIN methods that use prior knowledge outperforming methods that use expression alone. We found that imputation for network inference did not improve network inference accuracy and could be detrimental. Comparisons of inferred networks for comparable bulk conditions showed that the networks inferred from scRNA-seq datasets are often better or at par with the networks inferred from bulk datasets. Our analysis should be beneficial in selecting methods for network inference. At the same time, this highlights the need for improved methods and better gold standards for regulatory network inference from scRNAseq datasets.


Assuntos
Algoritmos , Neurofibromina 2 , Humanos , Animais , Camundongos , Análise da Expressão Gênica de Célula Única , Análise de Célula Única/métodos , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Saccharomyces cerevisiae , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica
13.
Sci Adv ; 8(39): eabn7430, 2022 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-36179024

RESUMO

Our inability to derive the neuronal diversity that comprises the posterior central nervous system (pCNS) using human pluripotent stem cells (hPSCs) poses an impediment to understanding human neurodevelopment and disease in the hindbrain and spinal cord. Here, we establish a modular, monolayer differentiation paradigm that recapitulates both rostrocaudal (R/C) and dorsoventral (D/V) patterning, enabling derivation of diverse pCNS neurons with discrete regional specificity. First, neuromesodermal progenitors (NMPs) with discrete HOX profiles are converted to pCNS progenitors (pCNSPs). Then, by tuning D/V signaling, pCNSPs are directed to locomotor or somatosensory neurons. Expansive single-cell RNA-sequencing (scRNA-seq) analysis coupled with a novel computational pipeline allowed us to detect hundreds of transcriptional markers within region-specific phenotypes, enabling discovery of gene expression patterns across R/C and D/V developmental axes. These findings highlight the potential of these resources to advance a mechanistic understanding of pCNS development, enhance in vitro models, and inform therapeutic strategies.


Assuntos
Neurônios , Transcriptoma , Diferenciação Celular/genética , Sistema Nervoso Central , Humanos , Neurônios/fisiologia , RNA
14.
Biochem Biophys Res Commun ; 412(3): 454-9, 2011 Sep 02.
Artigo em Inglês | MEDLINE | ID: mdl-21839727

RESUMO

The inhibitors of apoptosis proteins (IAP), which include cIAP1, cIAP2 and XIAP, suppress apoptosis through the inhibition of caspases, and the activity of IAPs is regulated by a variety of IAP-binding proteins. Herein, we report the identification of a Vestigial-like 4 (Vgl-4), which functions as a transcription cofactor in cardiac myocytes, as a new IAP binding protein. Vgl-4 is expressed predominantly in the nucleus and its overexpression triggers a relocalization of IAPs from the cytoplasm to the nucleus. cIAP1/2-interacting protein TRAF2 (TNF receptor-associated factor 2) prevented the Vgl-4-driven nuclear localization of cIAP2. Accordingly, the forced relocation of IAPs to the nucleus by Vgl-4 significantly reduced their ability to prevent Bax- and TNFα-induced apoptosis, which can be recovered by co-expression with TRAF2. Our results suggest that Vgl-4 may play a role in the apoptotic pathways by regulating translocation of IAPs between different cell compartments.


Assuntos
Apoptose , Núcleo Celular/metabolismo , Proteínas Inibidoras de Apoptose/metabolismo , Fatores de Transcrição/metabolismo , Transporte Ativo do Núcleo Celular , Células HEK293 , Células HeLa , Humanos , Fator 2 Associado a Receptor de TNF/metabolismo , Proteínas Inibidoras de Apoptose Ligadas ao Cromossomo X
15.
Curr Opin Syst Biol ; 23: 38-46, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33225112

RESUMO

Transcriptional regulatory networks control context-specific gene expression patterns and play important roles in normal and disease processes. Advances in genomics are rapidly increasing our ability to measure different components of the regulation machinery at the single-cell and bulk population level. An important challenge is to combine different types of regulatory genomic measurements to construct a more complete picture of gene regulatory networks across different disease, environmental, and developmental contexts. In this review, we focus on recent computational methods that integrate regulatory genomic data sets to infer context specificity and dynamics in regulatory networks.

16.
Methods Mol Biol ; 1526: 87-98, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27896737

RESUMO

Functional constraints between genes display similar patterns of gain or loss during speciation. Similar phylogenetic profiles, therefore, can be an indication of a functional association between genes. The phylogenetic profiling method has been applied successfully to the reconstruction of gene pathways and the inference of unknown gene functions. This method requires only sequence data to generate phylogenetic profiles. This method therefore has the potential to take advantage of the recent explosion in available sequence data to reveal a significant number of functional associations between genes. Since the initial development of phylogenetic profiling, many modifications to improve this method have been proposed, including improvements in the measurement of profile similarity and the selection of reference species. Here, we describe the existing methods of phylogenetic profiling for the inference of functional associations and discuss their technical limitations and caveats.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes/genética , Perfilação da Expressão Gênica , Filogenia
17.
PLoS One ; 10(9): e0139006, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26394049

RESUMO

Phylogenetic profiling, a network inference method based on gene inheritance profiles, has been widely used to construct functional gene networks in microbes. However, its utility for network inference in higher eukaryotes has been limited. An improved algorithm with an in-depth understanding of pathway evolution may overcome this limitation. In this study, we investigated the effects of taxonomic structures on co-inheritance analysis using 2,144 reference species in four query species: Escherichia coli, Saccharomyces cerevisiae, Arabidopsis thaliana, and Homo sapiens. We observed three clusters of reference species based on a principal component analysis of the phylogenetic profiles, which correspond to the three domains of life-Archaea, Bacteria, and Eukaryota-suggesting that pathways inherit primarily within specific domains or lower-ranked taxonomic groups during speciation. Hence, the co-inheritance pattern within a taxonomic group may be eroded by confounding inheritance patterns from irrelevant taxonomic groups. We demonstrated that co-inheritance analysis within domains substantially improved network inference not only in microbe species but also in the higher eukaryotes, including humans. Although we observed two sub-domain clusters of reference species within Eukaryota, co-inheritance analysis within these sub-domain taxonomic groups only marginally improved network inference. Therefore, we conclude that co-inheritance analysis within domains is the optimal approach to network inference with the given reference species. The construction of a series of human gene networks with increasing sample sizes of the reference species for each domain revealed that the size of the high-accuracy networks increased as additional reference species genomes were included, suggesting that within-domain co-inheritance analysis will continue to expand human gene networks as genomes of additional species are sequenced. Taken together, we propose that co-inheritance analysis within the domains of life will greatly potentiate the use of the expected onslaught of sequenced genomes in the study of molecular pathways in higher eukaryotes.


Assuntos
Padrões de Herança , Filogenia , Sequência de Aminoácidos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Componente Principal , Especificidade da Espécie
18.
Artigo em Inglês | MEDLINE | ID: mdl-25650278

RESUMO

During the past several decades, Escherichia coli has been a treasure chest for molecular biology. The molecular mechanisms of many fundamental cellular processes have been discovered through research on this bacterium. Although much basic research now focuses on more complex model organisms, E. coli still remains important in metabolic engineering and synthetic biology. Despite its long history as a subject of molecular investigation, more than one-third of the E. coli genome has no pathway annotation supported by either experimental evidence or manual curation. Recently, a network-assisted genetics approach to the efficient identification of novel gene functions has increased in popularity. To accelerate the speed of pathway annotation for the remaining uncharacterized part of the E. coli genome, we have constructed a database of cofunctional gene network with near-complete genome coverage of the organism, dubbed EcoliNet. We find that EcoliNet is highly predictive for diverse bacterial phenotypes, including antibiotic response, indicating that it will be useful in prioritizing novel candidate genes for a wide spectrum of bacterial phenotypes. We have implemented a web server where biologists can easily run network algorithms over EcoliNet to predict novel genes involved in a pathway or novel functions for a gene. All integrated cofunctional associations can be downloaded, enabling orthology-based reconstruction of gene networks for other bacterial species as well. Database URL: http://www.inetbio.org/ecolinet.


Assuntos
Curadoria de Dados , Bases de Dados de Ácidos Nucleicos , Escherichia coli/genética , Ontologia Genética , Redes Reguladoras de Genes , Genoma Bacteriano
19.
Sci Rep ; 5: 8767, 2015 Mar 05.
Artigo em Inglês | MEDLINE | ID: mdl-25739925

RESUMO

Cryptococcus neoformans is an opportunistic human pathogenic fungus that causes meningoencephalitis. Due to the increasing global risk of cryptococcosis and the emergence of drug-resistant strains, the development of predictive genetics platforms for the rapid identification of novel genes governing pathogenicity and drug resistance of C. neoformans is imperative. The analysis of functional genomics data and genome-scale mutant libraries may facilitate the genetic dissection of such complex phenotypes but with limited efficiency. Here, we present a genome-scale co-functional network for C. neoformans, CryptoNet, which covers ~81% of the coding genome and provides an efficient intermediary between functional genomics data and reverse-genetics resources for the genetic dissection of C. neoformans phenotypes. CryptoNet is the first genome-scale co-functional network for any fungal pathogen. CryptoNet effectively identified novel genes for pathogenicity and drug resistance using guilt-by-association and context-associated hub algorithms. CryptoNet is also the first genome-scale co-functional network for fungi in the basidiomycota phylum, as Saccharomyces cerevisiae belongs to the ascomycota phylum. CryptoNet may therefore provide insights into pathway evolution between two distinct phyla of the fungal kingdom. The CryptoNet web server (www.inetbio.org/cryptonet) is a public resource that provides an interactive environment of network-assisted predictive genetics for C. neoformans.


Assuntos
Antifúngicos/farmacologia , Criptococose/microbiologia , Cryptococcus neoformans/efeitos dos fármacos , Cryptococcus neoformans/genética , Farmacorresistência Fúngica , Infecções Oportunistas/microbiologia , Biologia Computacional/métodos , Cryptococcus neoformans/patogenicidade , Redes Reguladoras de Genes , Genes Fúngicos , Genoma Fúngico , Genômica/métodos , Humanos , Modelos Teóricos , Fenótipo , Virulência/genética
20.
Sci Rep ; 5: 11432, 2015 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-26066708

RESUMO

The reconstruction of transcriptional regulatory networks (TRNs) is a long-standing challenge in human genetics. Numerous computational methods have been developed to infer regulatory interactions between human transcriptional factors (TFs) and target genes from high-throughput data, and their performance evaluation requires gold-standard interactions. Here we present a database of literature-curated human TF-target interactions, TRRUST (transcriptional regulatory relationships unravelled by sentence-based text-mining, http://www.grnpedia.org/trrust), which currently contains 8,015 interactions between 748 TF genes and 1,975 non-TF genes. A sentence-based text-mining approach was employed for efficient manual curation of regulatory interactions from approximately 20 million Medline abstracts. To the best of our knowledge, TRRUST is the largest publicly available database of literature-curated human TF-target interactions to date. TRRUST also has several useful features: i) information about the mode-of-regulation; ii) tests for target modularity of a query TF; iii) tests for TF cooperativity of a query target; iv) inferences about cooperating TFs of a query TF; and v) prioritizing associated pathways and diseases with a query TF. We observed high enrichment of TF-target pairs in TRRUST for top-scored interactions inferred from high-throughput data, which suggests that TRRUST provides a reliable benchmark for the computational reconstruction of human TRNs.


Assuntos
Mineração de Dados , Bases de Dados Genéticas , Transcrição Gênica , Transcriptoma , Curadoria de Dados , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA