RESUMO
Global insights into cellular organization and genome function require comprehensive understanding of the interactome networks that mediate genotype-phenotype relationships1,2. Here we present a human 'all-by-all' reference interactome map of human binary protein interactions, or 'HuRI'. With approximately 53,000 protein-protein interactions, HuRI has approximately four times as many such interactions as there are high-quality curated interactions from small-scale studies. The integration of HuRI with genome3, transcriptome4 and proteome5 data enables cellular function to be studied within most physiological or pathological cellular contexts. We demonstrate the utility of HuRI in identifying the specific subcellular roles of protein-protein interactions. Inferred tissue-specific networks reveal general principles for the formation of cellular context-specific functions and elucidate potential molecular mechanisms that might underlie tissue-specific phenotypes of Mendelian diseases. HuRI is a systematic proteome-wide reference that links genomic variation to phenotypic outcomes.
Assuntos
Proteoma/metabolismo , Espaço Extracelular/metabolismo , Humanos , Especificidade de Órgãos , Mapeamento de Interação de ProteínasRESUMO
Condition-dependent genetic interactions can reveal functional relationships between genes that are not evident under standard culture conditions. State-of-the-art yeast genetic interaction mapping, which relies on robotic manipulation of arrays of double-mutant strains, does not scale readily to multi-condition studies. Here, we describe barcode fusion genetics to map genetic interactions (BFG-GI), by which double-mutant strains generated via en masse "party" mating can also be monitored en masse for growth to detect genetic interactions. By using site-specific recombination to fuse two DNA barcodes, each representing a specific gene deletion, BFG-GI enables multiplexed quantitative tracking of double mutants via next-generation sequencing. We applied BFG-GI to a matrix of DNA repair genes under nine different conditions, including methyl methanesulfonate (MMS), 4-nitroquinoline 1-oxide (4NQO), bleomycin, zeocin, and three other DNA-damaging environments. BFG-GI recapitulated known genetic interactions and yielded new condition-dependent genetic interactions. We validated and further explored a subnetwork of condition-dependent genetic interactions involving MAG1, SLX4, and genes encoding the Shu complex, and inferred that loss of the Shu complex leads to an increase in the activation of the checkpoint protein kinase Rad53.
Assuntos
Mapeamento Cromossômico , Código de Barras de DNA Taxonômico , Dano ao DNA , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Reparo do DNA , Epistasia Genética , Deleção de Genes , Loci Gênicos , Sequenciamento de Nucleotídeos em Larga Escala , Metanossulfonato de Metila , Modelos Teóricos , Regiões Promotoras Genéticas , Reprodutibilidade dos TestesRESUMO
The emergence and prevalence of drug resistance demands streamlined strategies to identify drug resistant variants in a fast, systematic and cost-effective way. Methods commonly used to understand and predict drug resistance rely on limited clinical studies from patients who are refractory to drugs or on laborious evolution experiments with poor coverage of the gene variants. Here, we report an integrative functional variomics methodology combining deep sequencing and a Bayesian statistical model to provide a comprehensive list of drug resistance alleles from complex variant populations. Dihydrofolate reductase, the target of methotrexate chemotherapy drug, was used as a model to identify functional mutant alleles correlated with methotrexate resistance. This systematic approach identified previously reported resistance mutations, as well as novel point mutations that were validated in vivo. Use of this systematic strategy as a routine diagnostics tool widens the scope of successful drug research and development.
Assuntos
Resistencia a Medicamentos Antineoplásicos/genética , Neoplasias/tratamento farmacológico , Tetra-Hidrofolato Desidrogenase/metabolismo , Alelos , Teorema de Bayes , Antagonistas do Ácido Fólico/uso terapêutico , Humanos , Metotrexato/uso terapêutico , Mutação , Neoplasias/genética , Tetra-Hidrofolato Desidrogenase/genéticaRESUMO
Although we now routinely sequence human genomes, we can confidently identify only a fraction of the sequence variants that have a functional impact. Here, we developed a deep mutational scanning framework that produces exhaustive maps for human missense variants by combining random codon mutagenesis and multiplexed functional variation assays with computational imputation and refinement. We applied this framework to four proteins corresponding to six human genes: UBE2I (encoding SUMO E2 conjugase), SUMO1 (small ubiquitin-like modifier), TPK1 (thiamin pyrophosphokinase), and CALM1/2/3 (three genes encoding the protein calmodulin). The resulting maps recapitulate known protein features and confidently identify pathogenic variation. Assays potentially amenable to deep mutational scanning are already available for 57% of human disease genes, suggesting that DMS could ultimately map functional variation for all human disease genes.
Assuntos
Análise Mutacional de DNA/métodos , Mutação de Sentido Incorreto/genética , Calmodulina/genética , Doença/genética , Humanos , Aprendizado de Máquina , Fenótipo , Filogenia , Reprodutibilidade dos Testes , Proteína SUMO-1/genética , Enzimas de Conjugação de Ubiquitina/genética , Enzimas de Conjugação de Ubiquitina/metabolismoRESUMO
High-throughput binary protein interaction mapping is continuing to extend our understanding of cellular function and disease mechanisms. However, we remain one or two orders of magnitude away from a complete interaction map for humans and other major model organisms. Completion will require screening at substantially larger scales with many complementary assays, requiring further efficiency gains in proteome-scale interaction mapping. Here, we report Barcode Fusion Genetics-Yeast Two-Hybrid (BFG-Y2H), by which a full matrix of protein pairs can be screened in a single multiplexed strain pool. BFG-Y2H uses Cre recombination to fuse DNA barcodes from distinct plasmids, generating chimeric protein-pair barcodes that can be quantified via next-generation sequencing. We applied BFG-Y2H to four different matrices ranging in scale from ~25 K to 2.5 M protein pairs. The results show that BFG-Y2H increases the efficiency of protein matrix screening, with quality that is on par with state-of-the-art Y2H methods.
Assuntos
Centrossomo/metabolismo , Mapeamento de Interação de Proteínas/métodos , Proteoma/metabolismo , Saccharomyces cerevisiae/genética , Cromossomos Humanos/metabolismo , Biblioteca Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Ligação Proteica , Técnicas do Sistema de Duplo-HíbridoRESUMO
Although RNA-mediated interference (RNAi) is a widely conserved process among eukaryotes, including many fungi, it is absent from the budding yeast Saccharomyces cerevisiae. Three human proteins, Ago2, Dicer and TRBP, are sufficient for reconstituting the RISC complex in vitro. To examine whether the introduction of human RNAi genes can reconstitute RNAi in S. cerevisiae, genes encoding these three human proteins were introduced into S. cerevisiae. We observed both siRNA and siRNA- and RISC-dependent silencing of the target gene GFP. Thus, human Ago2, Dicer and TRBP can functionally reconstitute human RNAi in S. cerevisiae, in vivo, enabling the study and use of the human RNAi pathway in a facile genetic model organism.
Assuntos
Interferência de RNA , Saccharomyces cerevisiae/genética , Proteínas Argonautas , Fator de Iniciação 2 em Eucariotos/genética , Fator de Iniciação 2 em Eucariotos/metabolismo , Humanos , Modelos Genéticos , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Ribonuclease III/genética , Ribonuclease III/metabolismo , Saccharomyces cerevisiae/metabolismoRESUMO
Certain viruses use microRNAs (miRNAs) to regulate the expression of their own genes, host genes, or both. Previous studies have identified a limited number of miRNAs expressed by herpes simplex viruses 1 and 2 (HSV-1 and -2), some of which are conserved between these two viruses. To more comprehensively analyze the miRNAs expressed by HSV-1 or HSV-2 during productive and latent infection, we applied a massively parallel sequencing approach. We were able to identify 16 and 17 miRNAs expressed by HSV-1 and HSV-2, respectively, including all previously known species, and a number of previously unidentified virus-encoded miRNAs. The genomic positions of most miRNAs encoded by these two viruses are within or proximal to the latency-associated transcript region. Nine miRNAs are conserved in position and/or sequence, particularly in the seed region, between these two viruses. Interestingly, we did not detect an HSV-2 miRNA homolog of HSV-1 miR-H1, which is highly expressed during productive infection, but we did detect abundant expression of miR-H6, whose seed region is conserved with HSV-1 miR-H1 and might represent a functional analog. We also identified a highly conserved miRNA family arising from the viral origins of replication. In addition, we detected several pairs of complementary miRNAs and we found miRNA-offset RNAs (moRs) arising from the precursors of HSV-1 and HSV-2 miR-H6 and HSV-2 miR-H4. Our results reveal elements of miRNA conservation and divergence that should aid in identifying miRNA functions.
Assuntos
Sequência Conservada , Herpesvirus Humano 1/fisiologia , Herpesvirus Humano 2/fisiologia , MicroRNAs/biossíntese , Polimorfismo Genético , RNA Viral/biossíntese , Animais , Linhagem Celular , Chlorocebus aethiops , Humanos , MicroRNAs/genética , RNA Viral/genética , Análise de Sequência de DNARESUMO
Many traits are complex, depending non-additively on variant combinations. Even in model systems, such as the yeast S. cerevisiae, carrying out the high-order variant-combination testing needed to dissect complex traits remains a daunting challenge. Here, we describe "X-gene" genetic analysis (XGA), a strategy for engineering and profiling highly combinatorial gene perturbations. We demonstrate XGA on yeast ABC transporters by engineering 5,353 strains, each deleted for a random subset of 16 transporters, and profiling each strain's resistance to 16 compounds. XGA yielded 85,648 genotype-to-resistance observations, revealing high-order genetic interactions for 13 of the 16 transporters studied. Neural networks yielded intuitive functional models and guided exploration of fluconazole resistance, which was influenced non-additively by five genes. Together, our results showed that highly combinatorial genetic perturbation can functionally dissect complex traits, supporting pursuit of analogous strategies in human cells and other model systems.
Assuntos
Transporte Biológico/genética , Proteínas de Membrana Transportadoras/genética , HumanosRESUMO
The ubiquity and cheapness of miniature low-power sensors, digital processing, and large amounts of storage contained in small packages has heralded the ability to acquire large amounts of data about systems during their course of operation. The size and complexity of the data sets so generated have colloquially been labeled "big data." The computer science field of "data mining" has arisen with the purpose of extracting meaning from such data, expressly looking for patterns that not only link historic observations but also predict future behavior. This overview article considers the process, techniques, and interpretation of data mining, with specific focus on its application in audiology. Modern hearing instruments contain data-logging technology to record data separate from the audio stream, such as the acoustic environments in which the device was being used and how the signal processing was consequently operating. Combined with details about the patient, such as the audiogram, the variety of data generated lends itself to a data mining approach. To date, reports of the use and interpretation of these data have been mostly constrained to questions such as looking for changes in patterns of daily use, or the degree and direction of volume control manipulation as the patient's experience with a hearing aid changes. In this, and an accompanying results paper, the practical applications of some data mining techniques are described as applied to a large data set of examples of real-world device usage, as supplied by a hearing aid manufacturer.
Assuntos
Audiologia , Big Data , Mineração de Dados/métodos , Auxiliares de Audição , Testes Auditivos , Humanos , Processamento de Sinais Assistido por ComputadorRESUMO
Species identification of yeasts and other Fungi is currently carried out with Sanger sequences of selected molecular markers, mainly from the ribosomal DNA operon, characterized by hundreds of tandem repeats of the 18S, ITS1, 5.8S, ITS2 and LSU loci. The ITS region has been recently proposed as a primary barcode marker making this region the most used one in taxonomy, phylogeny and diagnostics. The introduction of NGS is providing tools of high efficacy and relatively low cost to amplify two or more markers simultaneously with great sequencing depth. However, the presence of intra-genomic variability between the repeats requires specific analytical procedures and pipelines. In this study, 286 strains belonging to 11 pathogenic yeasts species were analysed with NGS of the region spanning from ITS1 to the D1/D2 domain of the LSU encoding ribosomal DNA. Results showed that relatively high heterogeneity can hamper the use of these sequences for the identification of single strains and even more of complex microbial mixtures. These observations point out that the metagenomics studies could be affected by species inflection at levels higher than currently expected.
RESUMO
Conservation of proximity of a pair of genes across multiple genomes generally indicates that their functions could be linked. Here, we present a systematic evaluation using 42 complete microbial genomes from 25 phylogenetic groups to test the reliability of this observation in predicting function for genes. We find a relationship between the number of phylogenetic groups in which a gene pair is proximate and the probability that the pair belongs to a common pathway. Our method produces 1586 links between ortholog families substantiated by observed proximity in genomes representing at least three phylogenetic groups. Of the pairs annotated in the KEGG database, 80% are in the same biological pathway in KEGG.
Assuntos
Mapeamento Cromossômico , Genética Microbiana , Sintenia/genética , Sintenia/fisiologiaRESUMO
The current deluge of genomic sequences has spawned the creation of tools capable of making sense of the data. Computational and high-throughput experimental methods for generating links between proteins have recently been emerging. These methods effectively act as hypothesis machines, allowing researchers to screen large sets of data to detect interesting patterns that can then be studied in greater detail. Although the potential use of these putative links in predicting gene function has been demonstrated, a central repository for all such links for many genomes would maximize their usefulness. Here we present Predictome, a database of predicted links between the proteins of 44 genomes based on the implementation of three computational methods--chromosomal proximity, phylogenetic profiling and domain fusion--and large-scale experimental screenings of protein-protein interaction data. The combination of data from various predictive methods in one database allows for their comparison with each other, as well as visualization of their correlation with known pathway information. As a repository for such data, Predictome is an ongoing resource for the community, providing functional relationships among proteins as new genomic data emerges. Predictome is available at http://predictome.bu.edu.
Assuntos
Bases de Dados de Proteínas , Genoma , Proteínas/genética , Proteínas/fisiologia , Animais , Fusão Gênica Artificial , Mapeamento Cromossômico , Previsões , Armazenamento e Recuperação da Informação , Internet , Substâncias Macromoleculares , Filogenia , Integração de SistemasRESUMO
Genetic suppression occurs when the phenotypic defects caused by a mutation in a particular gene are rescued by a mutation in a second gene. To explore the principles of genetic suppression, we examined both literature-curated and unbiased experimental data, involving systematic genetic mapping and whole-genome sequencing, to generate a large-scale suppression network among yeast genes. Most suppression pairs identified novel relationships among functionally related genes, providing new insights into the functional wiring diagram of the cell. In addition to suppressor mutations, we identified frequent secondary mutations,in a subset of genes, that likely cause a delay in the onset of stationary phase, which appears to promote their enrichment within a propagating population. These findings allow us to formulate and quantify general mechanisms of genetic suppression.
Assuntos
Redes Reguladoras de Genes , Genes Fúngicos , Genes Supressores , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Supressão Genética , Fenômenos Fisiológicos Celulares/genética , Mapeamento CromossômicoRESUMO
Phylogenetic profiling is now an effective computational method to detect functional associations between proteins. The method links two proteins in accordance with the similarity of their phyletic distributions across a set of genomes. While pair-wise linkage is useful, it misses correlations in higher order groups: triplets, quadruplets, and so on. Here we assess the probability of observing co-occurrence patterns of 3 binary profiles by chance and show that this probability is asymptotically the same as the mutual information in three profiles. We demonstrate the utility of the probability and the mutual information metrics in detecting overly represented triplets of orthologous proteins which could not be detected using pairwise profiles. These triplets serve as small building blocks, i.e. motifs in protein networks; they allow us to infer the function of uncharacterized members, and facilitate analysis of the local structure and global organization of the protein network. Our method is extendable to N-component clusters, and therefore serves as a general tool for high order protein function annotation.
Assuntos
Biologia Computacional/estatística & dados numéricos , Perfilação da Expressão Gênica , Filogenia , Proteínas/química , Motivos de Aminoácidos , Sequência de Aminoácidos , Entropia , Ligação Genética , Genoma , Lógica , Modelos Estatísticos , Processos Estocásticos , Técnicas do Sistema de Duplo-HíbridoRESUMO
Problems of inference in systems biology are ideally reduced to formulations which can efficiently represent the features of interest. In the case of predicting gene regulation and pathway networks, an important feature which describes connected genes and proteins is the relationship between active and inactive forms, i.e. between the "on" and "off" states of the components. While not optimal at the limits of resolution, these logical relationships between discrete states can often yield good approximations of the behavior in larger complex systems, where exact representation of measurement relationships may be intractable. We explore techniques for extracting binary state variables from measurement of gene expression, and go on to describe robust measures for statistical significance and information that can be applied to many such types of data. We show how statistical strength and information are equivalent criteria in limiting cases, and demonstrate the application of these measures to simple systems of gene regulation.
Assuntos
Modelos Biológicos , Redes Neurais de Computação , Biologia de Sistemas/métodos , Animais , Expressão Gênica , Genoma Fúngico , Cinética , Matemática , Modelos Genéticos , Filogenia , Probabilidade , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/genéticaRESUMO
The fission yeast Schizosaccharomyces pombe has more metazoan-like features than the budding yeast Saccharomyces cerevisiae, yet it has similarly facile genetics. We present a large-scale verified binary protein-protein interactome network, "StressNet," based on high-throughput yeast two-hybrid screens of interacting proteins classified as part of stress response and signal transduction pathways in S. pombe. We performed systematic, cross-species interactome mapping using StressNet and a protein interactome network of orthologous proteins in S. cerevisiae. With cross-species comparative network studies, we detected a previously unidentified component (Snr1) of the S. pombe mitogen-activated protein kinase Sty1 pathway. Coimmunoprecipitation experiments showed that Snr1 interacted with Sty1 and that deletion of snr1 increased the sensitivity of S. pombe cells to stress. Comparison of StressNet with the interactome network of orthologous proteins in S. cerevisiae showed that most of the interactions among these stress response and signaling proteins are not conserved between species but are "rewired"; orthologous proteins have different binding partners in both species. In particular, transient interactions connecting proteins in different functional modules were more likely to be rewired than conserved. By directly testing interactions between proteins in one yeast species and their corresponding binding partners in the other yeast species with yeast two-hybrid assays, we found that about half of the interactions that are traditionally considered "conserved" form modified interaction interfaces that may potentially accommodate novel functions.
Assuntos
Proteoma , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Proteínas de Schizosaccharomyces pombe/metabolismo , Schizosaccharomyces/metabolismo , Imunoprecipitação , Transdução de Sinais , Técnicas do Sistema de Duplo-HíbridoRESUMO
BACKGROUND: Approximately 35% of human genes contain introns within the 5' untranslated region (UTR). Introns in 5'UTRs differ from those in coding regions and 3'UTRs with respect to nucleotide composition, length distribution and density. Despite their presumed impact on gene regulation, the evolution and possible functions of 5'UTR introns remain largely unexplored. RESULTS: We performed a genome-scale computational analysis of 5'UTR introns in humans. We discovered that the most highly expressed genes tended to have short 5'UTR introns rather than having long 5'UTR introns or lacking 5'UTR introns entirely. Although we found no correlation in 5'UTR intron presence or length with variance in expression across tissues, which might have indicated a broad role in expression-regulation, we observed an uneven distribution of 5'UTR introns amongst genes in specific functional categories. In particular, genes with regulatory roles were surprisingly enriched in having 5'UTR introns. Finally, we analyzed the evolution of 5'UTR introns in non-receptor protein tyrosine kinases (NRTK), and identified a conserved DNA motif enriched within the 5'UTR introns of human NRTKs. CONCLUSIONS: Our results suggest that human 5'UTR introns enhance the expression of some genes in a length-dependent manner. While many 5'UTR introns are likely to be evolving neutrally, their relationship with gene expression and overrepresentation among regulatory genes, taken together, suggest that complex evolutionary forces are acting on this distinct class of introns.