RESUMEN
Although we now routinely sequence human genomes, we can confidently identify only a fraction of the sequence variants that have a functional impact. Here, we developed a deep mutational scanning framework that produces exhaustive maps for human missense variants by combining random codon mutagenesis and multiplexed functional variation assays with computational imputation and refinement. We applied this framework to four proteins corresponding to six human genes: UBE2I (encoding SUMO E2 conjugase), SUMO1 (small ubiquitin-like modifier), TPK1 (thiamin pyrophosphokinase), and CALM1/2/3 (three genes encoding the protein calmodulin). The resulting maps recapitulate known protein features and confidently identify pathogenic variation. Assays potentially amenable to deep mutational scanning are already available for 57% of human disease genes, suggesting that DMS could ultimately map functional variation for all human disease genes.
Asunto(s)
Análisis Mutacional de ADN/métodos , Mutación Missense/genética , Calmodulina/genética , Enfermedad/genética , Humanos , Aprendizaje Automático , Fenotipo , Filogenia , Reproducibilidad de los Resultados , Proteína SUMO-1/genética , Enzimas Ubiquitina-Conjugadoras/genética , Enzimas Ubiquitina-Conjugadoras/metabolismoRESUMEN
In cellular systems, biophysical interactions between macromolecules underlie a complex web of functional interactions. How biophysical and functional networks are coordinated, whether all biophysical interactions correspond to functional interactions, and how such biophysical-versus-functional network coordination is shaped by evolutionary forces are all largely unanswered questions. Here, we investigate these questions using an "inter-interactome" approach. We systematically probed the yeast and human proteomes for interactions between proteins from these two species and functionally characterized the resulting inter-interactome network. After a billion years of evolutionary divergence, the yeast and human proteomes are still capable of forming a biophysical network with properties that resemble those of intra-species networks. Although substantially reduced relative to intra-species networks, the levels of functional overlap in the yeast-human inter-interactome network uncover significant remnants of co-functionality widely preserved in the two proteomes beyond human-yeast homologs. Our data support evolutionary selection against biophysical interactions between proteins with little or no co-functionality. Such non-functional interactions, however, represent a reservoir from which nascent functional interactions may arise.
Asunto(s)
Proteínas Fúngicas/metabolismo , Mapeo de Interacción de Proteínas/métodos , Proteoma/metabolismo , Biología Computacional/métodos , Bases de Datos de Proteínas , Evolución Molecular , HumanosRESUMEN
Genome-wide association (GWA) studies have linked thousands of loci to human diseases, but the causal genes and variants at these loci generally remain unknown. Although investigators typically focus on genes closest to the associated polymorphisms, the causal gene is often more distal. Reliance on published work to prioritize candidates is biased toward well-characterized genes. We describe a 'prix fixe' strategy and software that uses genome-scale shared-function networks to identify sets of mutually functionally related genes spanning multiple GWA loci. Using associations from â¼100 GWA studies covering ten cancer types, our approach outperformed the common alternative strategy in ranking known cancer genes. As more GWA loci are discovered, the strategy will have increased power to elucidate the causes of human disease.
Asunto(s)
Biología Computacional/métodos , Genes Relacionados con las Neoplasias , Estudio de Asociación del Genoma Completo/métodos , Neoplasias/genética , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Animales , Ontología de Genes , Predisposición Genética a la Enfermedad , Humanos , Programas InformáticosRESUMEN
Just as reference genome sequences revolutionized human genetics, reference maps of interactome networks will be critical to fully understand genotype-phenotype relationships. Here, we describe a systematic map of ?14,000 high-quality human binary protein-protein interactions. At equal quality, this map is ?30% larger than what is available from small-scale studies published in the literature in the last few decades. While currently available information is highly biased and only covers a relatively small portion of the proteome, our systematic map appears strikingly more homogeneous, revealing a "broader" human interactome network than currently appreciated. The map also uncovers significant interconnectivity between known and candidate cancer gene products, providing unbiased evidence for an expanded functional cancer landscape, while demonstrating how high-quality interactome models will help "connect the dots" of the genomic revolution.
Asunto(s)
Mapas de Interacción de Proteínas , Proteoma/metabolismo , Animales , Bases de Datos de Proteínas , Estudio de Asociación del Genoma Completo , Humanos , Ratones , Neoplasias/metabolismoRESUMEN
Increased risk for autism spectrum disorders (ASD) is attributed to hundreds of genetic loci. The convergence of ASD variants have been investigated using various approaches, including protein interactions extracted from the published literature. However, these datasets are frequently incomplete, carry biases and are limited to interactions of a single splicing isoform, which may not be expressed in the disease-relevant tissue. Here we introduce a new interactome mapping approach by experimentally identifying interactions between brain-expressed alternatively spliced variants of ASD risk factors. The Autism Spliceform Interaction Network reveals that almost half of the detected interactions and about 30% of the newly identified interacting partners represent contribution from splicing variants, emphasizing the importance of isoform networks. Isoform interactions greatly contribute to establishing direct physical connections between proteins from the de novo autism CNVs. Our findings demonstrate the critical role of spliceform networks for translating genetic knowledge into a better understanding of human diseases.
Asunto(s)
Trastorno Autístico/metabolismo , Empalme Alternativo/genética , Empalme Alternativo/fisiología , Trastorno Autístico/genética , Predisposición Genética a la Enfermedad/genética , Humanos , Datos de Secuencia Molecular , Mapas de Interacción de Proteínas/genética , Mapas de Interacción de Proteínas/fisiología , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Factores de RiesgoRESUMEN
One drug may suppress the effects of another. Although knowledge of drug suppression is vital to avoid efficacy-reducing drug interactions or discover countermeasures for chemical toxins, drug-drug suppression relationships have not been systematically mapped. Here, we analyze the growth response of Saccharomyces cerevisiae to anti-fungal compound ("drug") pairs. Among 440 ordered drug pairs, we identified 94 suppressive drug interactions. Using only pairs not selected on the basis of their suppression behavior, we provide an estimate of the prevalence of suppressive interactions between anti-fungal compounds as 17%. Analysis of the drug suppression network suggested that Bromopyruvate is a frequently suppressive drug and Staurosporine is a frequently suppressed drug. We investigated potential explanations for suppressive drug interactions, including chemogenomic analysis, coaggregation, and pH effects, allowing us to explain the interaction tendencies of Bromopyruvate.
Asunto(s)
Antifúngicos/farmacología , Piruvatos/farmacología , Saccharomyces cerevisiae/efectos de los fármacos , Saccharomyces cerevisiae/crecimiento & desarrollo , Bioensayo , Interacciones Farmacológicas , Concentración de Iones de Hidrógeno , Pruebas de Sensibilidad Microbiana , Saccharomyces cerevisiae/citología , Estaurosporina/farmacología , Relación Estructura-ActividadRESUMEN
Comprehensive functional annotation of vertebrate genomes is fundamental to biological discovery. Reverse genetic screening has been highly useful for determination of gene function, but is untenable as a systematic approach in vertebrate model organisms given the number of surveyable genes and observable phenotypes. Unbiased prediction of gene-phenotype relationships offers a strategy to direct finite experimental resources towards likely phenotypes, thus maximizing de novo discovery of gene functions. Here we prioritized genes for phenotypic assay in zebrafish through machine learning, predicting the effect of loss of function of each of 15,106 zebrafish genes on 338 distinct embryonic anatomical processes. Focusing on cardiovascular phenotypes, the learning procedure predicted known knockdown and mutant phenotypes with high precision. In proof-of-concept studies we validated 16 high-confidence cardiac predictions using targeted morpholino knockdown and initial blinded phenotyping in embryonic zebrafish, confirming a significant enrichment for cardiac phenotypes as compared with morpholino controls. Subsequent detailed analyses of cardiac function confirmed these results, identifying novel physiological defects for 11 tested genes. Among these we identified tmem88a, a recently described attenuator of Wnt signaling, as a discrete regulator of the patterning of intercellular coupling in the zebrafish cardiac epithelium. Thus, we show that systematic prioritization in zebrafish can accelerate the pace of developmental gene function discovery.
Asunto(s)
Regulación del Desarrollo de la Expresión Génica , Corazón/embriología , Proteínas de la Membrana/metabolismo , Miocardio/citología , Proteínas de Pez Cebra/metabolismo , Pez Cebra/embriología , Pez Cebra/genética , Animales , Embrión no Mamífero/metabolismo , Técnicas de Silenciamiento del Gen , Proteínas de la Membrana/genética , Morfolinos/genética , Fenotipo , Vía de Señalización Wnt/genética , Proteínas de Pez Cebra/genéticaRESUMEN
BACKGROUND: Cardiovascular disease (CVD) is the leading cause of death in the developed world. Human genetic studies, including genome-wide sequencing and SNP-array approaches, promise to reveal disease genes and mechanisms representing new therapeutic targets. In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits. RESULTS: To aid in localizing causal genes, we develop a machine learning approach, Objective Prioritization for Enhanced Novelty (OPEN), which quantitatively prioritizes gene-disease associations based on a diverse group of genomic features. This approach uses only unbiased predictive features and thus is not hampered by a preference towards previously well-characterized genes. We demonstrate success in identifying genetic determinants for CVD-related traits, including cholesterol levels, blood pressure, and conduction system and cardiomyopathy phenotypes. Using OPEN, we prioritize genes, including FLNC, for association with increased left ventricular diameter, which is a defining feature of a prevalent cardiovascular disorder, dilated cardiomyopathy or DCM. Using a zebrafish model, we experimentally validate FLNC and identify a novel FLNC splice-site mutation in a patient with severe DCM. CONCLUSION: Our approach stands to assist interpretation of large-scale genetic studies without compromising their fundamentally unbiased nature.
Asunto(s)
Inteligencia Artificial , Filaminas/genética , Genómica/métodos , Hipertrofia Ventricular Izquierda/genética , Hipertrofia Ventricular Izquierda/patología , Algoritmos , Animales , Enfermedades Cardiovasculares/genética , Enfermedades Cardiovasculares/patología , Modelos Animales de Enfermedad , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Ratones , Datos de Secuencia Molecular , Mutación , Pez CebraRESUMEN
Genotypic differences greatly influence susceptibility and resistance to disease. Understanding genotype-phenotype relationships requires that phenotypes be viewed as manifestations of network properties, rather than simply as the result of individual genomic variations. Genome sequencing efforts have identified numerous germline mutations, and large numbers of somatic genomic alterations, associated with a predisposition to cancer. However, it remains difficult to distinguish background, or 'passenger', cancer mutations from causal, or 'driver', mutations in these data sets. Human viruses intrinsically depend on their host cell during the course of infection and can elicit pathological phenotypes similar to those arising from mutations. Here we test the hypothesis that genomic variations and tumour viruses may cause cancer through related mechanisms, by systematically examining host interactome and transcriptome network perturbations caused by DNA tumour virus proteins. The resulting integrated viral perturbation data reflects rewiring of the host cell networks, and highlights pathways, such as Notch signalling and apoptosis, that go awry in cancer. We show that systematic analyses of host targets of viral proteins can identify cancer genes with a success rate on a par with their identification through functional genomics and large-scale cataloguing of tumour mutations. Together, these complementary approaches increase the specificity of cancer gene identification. Combining systems-level studies of pathogen-encoded gene products with genomic approaches will facilitate the prioritization of cancer-causing driver genes to advance the understanding of the genetic basis of human cancer.
Asunto(s)
Genes Relacionados con las Neoplasias/genética , Genoma Humano/genética , Interacciones Huésped-Patógeno , Neoplasias/genética , Neoplasias/metabolismo , Virus Oncogénicos/patogenicidad , Proteínas Virales/metabolismo , Adenoviridae/genética , Adenoviridae/metabolismo , Adenoviridae/patogenicidad , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Herpesvirus Humano 4/genética , Herpesvirus Humano 4/metabolismo , Herpesvirus Humano 4/patogenicidad , Interacciones Huésped-Patógeno/genética , Humanos , Neoplasias/patología , Virus Oncogénicos/genética , Virus Oncogénicos/metabolismo , Sistemas de Lectura Abierta/genética , Papillomaviridae/genética , Papillomaviridae/metabolismo , Papillomaviridae/patogenicidad , Poliomavirus/genética , Poliomavirus/metabolismo , Poliomavirus/patogenicidad , Receptores Notch/metabolismo , Transducción de Señal , Técnicas del Sistema de Dos Híbridos , Proteínas Virales/genéticaRESUMEN
The body of human genomic and proteomic evidence continues to grow at ever-increasing rates, while annotation efforts struggle to keep pace. A surprisingly small fraction of human genes have clear, documented associations with specific functions, and new functions continue to be found for characterized genes. Here we assembled an integrated collection of diverse genomic and proteomic data for 21,341 human genes and make quantitative associations of each to 4333 Gene Ontology terms. We combined guilt-by-profiling and guilt-by-association approaches to exploit features unique to the data types. Performance was evaluated by cross-validation, prospective validation, and by manual evaluation with the biological literature. Functional-linkage networks were also constructed, and their utility was demonstrated by identifying candidate genes related to a glioma FLN using a seed network from genome-wide association studies. Our annotations are presented-alongside existing validated annotations-in a publicly accessible and searchable web interface.
RESUMEN
Drug synergy allows a therapeutic effect to be achieved with lower doses of component drugs. Drug synergy can result when drugs target the products of genes that act in parallel pathways ('specific synergy'). Such cases of drug synergy should tend to correspond to synergistic genetic interaction between the corresponding target genes. Alternatively, 'promiscuous synergy' can arise when one drug non-specifically increases the effects of many other drugs, for example, by increased bioavailability. To assess the relative abundance of these drug synergy types, we examined 200 pairs of antifungal drugs in S. cerevisiae. We found 38 antifungal synergies, 37 of which were novel. While 14 cases of drug synergy corresponded to genetic interaction, 92% of the synergies we discovered involved only six frequently synergistic drugs. Although promiscuity of four drugs can be explained under the bioavailability model, the promiscuity of Tacrolimus and Pentamidine was completely unexpected. While many drug synergies correspond to genetic interactions, the majority of drug synergies appear to result from non-specific promiscuous synergy.
Asunto(s)
Antifúngicos/farmacología , Sinergismo Farmacológico , Saccharomyces cerevisiae/efectos de los fármacos , Antifúngicos/farmacocinética , Disponibilidad Biológica , Interacciones Farmacológicas , Pentamidina/farmacocinética , Pentamidina/farmacología , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Tacrolimus/farmacocinética , Tacrolimus/farmacologíaRESUMEN
Plants generate effective responses to infection by recognizing both conserved and variable pathogen-encoded molecules. Pathogens deploy virulence effector proteins into host cells, where they interact physically with host proteins to modulate defense. We generated an interaction network of plant-pathogen effectors from two pathogens spanning the eukaryote-eubacteria divergence, three classes of Arabidopsis immune system proteins, and ~8000 other Arabidopsis proteins. We noted convergence of effectors onto highly interconnected host proteins and indirect, rather than direct, connections between effectors and plant immune receptors. We demonstrated plant immune system functions for 15 of 17 tested host proteins that interact with effectors from both pathogens. Thus, pathogens from different kingdoms deploy independently evolved virulence proteins that interact with a limited set of highly connected cellular hubs to facilitate their diverse life-cycle strategies.
Asunto(s)
Arabidopsis/inmunología , Arabidopsis/metabolismo , Interacciones Huésped-Patógeno , Enfermedades de las Plantas/inmunología , Inmunidad de la Planta , Receptores Inmunológicos/metabolismo , Factores de Virulencia/metabolismo , Arabidopsis/genética , Arabidopsis/microbiología , Proteínas Bacterianas/metabolismo , Evolución Molecular , Genes de Plantas , Inmunidad Innata , Oomicetos/patogenicidad , Mapeo de Interacción de Proteínas , Pseudomonas syringae/patogenicidadRESUMEN
In higher eukaryotes, messenger RNAs (mRNAs) are exported from the nucleus to the cytoplasm via factors deposited near the 5' end of the transcript during splicing. The signal sequence coding region (SSCR) can support an alternative mRNA export (ALREX) pathway that does not require splicing. However, most SSCR-containing genes also have introns, so the interplay between these export mechanisms remains unclear. Here we support a model in which the furthest upstream element in a given transcript, be it an intron or an ALREX-promoting SSCR, dictates the mRNA export pathway used. We also experimentally demonstrate that nuclear-encoded mitochondrial genes can use the ALREX pathway. Thus, ALREX can also be supported by nucleotide signals within mitochondrial-targeting sequence coding regions (MSCRs). Finally, we identified and experimentally verified novel motifs associated with the ALREX pathway that are shared by both SSCRs and MSCRs. Our results show strong correlation between 5' untranslated region (5'UTR) intron presence/absence and sequence features at the beginning of the coding region. They also suggest that genes encoding secretory and mitochondrial proteins share a common regulatory mechanism at the level of mRNA export.
Asunto(s)
Regiones no Traducidas 5'/genética , Empalme Alternativo , Núcleo Celular/metabolismo , Transporte de ARN , ARN Mensajero/metabolismo , Transporte Activo de Núcleo Celular , Adenina/metabolismo , Citoplasma , Retículo Endoplásmico/genética , Regulación de la Expresión Génica , Genes Mitocondriales , Humanos , Intrones , Modelos Genéticos , Sistemas de Lectura Abierta , Señales de Clasificación de Proteína , Empalme del ARNRESUMEN
SUMMARY: Computational gene function prediction can serve to focus experimental resources on high-priority experimental tasks. FuncBase is a web resource for viewing quantitative machine learning-based gene function annotations. Quantitative annotations of genes, including fungal and mammalian genes, with Gene Ontology terms are accompanied by a community feedback system. Evidence underlying function annotations is shown. For example, a custom Cytoscape viewer shows functional linkage graphs relevant to the gene or function of interest. FuncBase provides links to external resources, and may be accessed directly or via links from species-specific databases. AVAILABILITY: FuncBase as well as all underlying data and annotations are freely available via http://func.med.harvard.edu/
Asunto(s)
Biología Computacional/métodos , Genes/fisiología , Programas Informáticos , Bases de Datos Factuales , Internet , Vocabulario ControladoRESUMEN
UNLABELLED: FuncAssociate is a web application that discovers properties enriched in lists of genes or proteins that emerge from large-scale experimentation. Here we describe an updated application with a new interface and several new features. For example, enrichment analysis can now be performed within multiple gene- and protein-naming systems. This feature avoids potentially serious translation artifacts to which other enrichment analysis strategies are subject. AVAILABILITY: The FuncAssociate web application is freely available to all users at http://llama.med.harvard.edu/funcassociate.
Asunto(s)
Biología Computacional/métodos , Programas Informáticos , Bases de Datos Factuales , Proteínas/química , Interfaz Usuario-ComputadorRESUMEN
To provide accurate biological hypotheses and elucidate global properties of cellular networks, systematic identification of protein-protein interactions must meet high quality standards.We present an expanded C. elegans protein-protein interaction network, or 'interactome' map, derived from testing a matrix of approximately 10,000 x approximately 10,000 proteins using a highly specific, high-throughput yeast two-hybrid system. Through a new empirical quality control framework, we show that the resulting data set (Worm Interactome 2007, or WI-2007) was similar in quality to low-throughput data curated from the literature. We filtered previous interaction data sets and integrated them with WI-2007 to generate a high-confidence consolidated map (Worm Interactome version 8, or WI8). This work allowed us to estimate the size of the worm interactome at approximately 116,000 interactions. Comparison with other types of functional genomic data shows the complementarity of distinct experimental approaches in predicting different functional relationships between genes or proteins
Asunto(s)
Proteínas de Caenorhabditis elegans/análisis , Proteínas de Caenorhabditis elegans/metabolismo , Caenorhabditis elegans/metabolismo , Mapeo de Interacción de Proteínas/métodos , Animales , Caenorhabditis elegans/genética , Proteínas de Caenorhabditis elegans/genética , Línea Celular , Humanos , Unión Proteica , Programas InformáticosRESUMEN
Information on protein-protein interactions is of central importance for many areas of biomedical research. At present no method exists to systematically and experimentally assess the quality of individual interactions reported in interaction mapping experiments. To provide a standardized confidence-scoring method that can be applied to tens of thousands of protein interactions, we have developed an interaction tool kit consisting of four complementary, high-throughput protein interaction assays. We benchmarked these assays against positive and random reference sets consisting of well documented pairs of interacting human proteins and randomly chosen protein pairs, respectively. A logistic regression model was trained using the data from these reference sets to combine the assay outputs and calculate the probability that any newly identified interaction pair is a true biophysical interaction once it has been tested in the tool kit. This general approach will allow a systematic and empirical assignment of confidence scores to all individual protein-protein interactions in interactome networks.
Asunto(s)
Mapeo de Interacción de Proteínas/métodos , Proteínas/análisis , Proteínas/metabolismo , Animales , Humanos , Unión Proteica , Sensibilidad y EspecificidadRESUMEN
Emerging metabolomic tools have created the opportunity to establish metabolic signatures of myocardial injury. We applied a mass spectrometry-based metabolite profiling platform to 36 patients undergoing alcohol septal ablation treatment for hypertrophic obstructive cardiomyopathy, a human model of planned myocardial infarction (PMI). Serial blood samples were obtained before and at various intervals after PMI, with patients undergoing elective diagnostic coronary angiography and patients with spontaneous myocardial infarction (SMI) serving as negative and positive controls, respectively. We identified changes in circulating levels of metabolites participating in pyrimidine metabolism, the tricarboxylic acid cycle and its upstream contributors, and the pentose phosphate pathway. Alterations in levels of multiple metabolites were detected as early as 10 minutes after PMI in an initial derivation group and were validated in a second, independent group of PMI patients. A PMI-derived metabolic signature consisting of aconitic acid, hypoxanthine, trimethylamine N-oxide, and threonine differentiated patients with SMI from those undergoing diagnostic coronary angiography with high accuracy, and coronary sinus sampling distinguished cardiac-derived from peripheral metabolic changes. Our results identify a role for metabolic profiling in the early detection of myocardial injury and suggest that similar approaches may be used for detection or prediction of other disease states.
Asunto(s)
Biomarcadores/sangre , Lesiones Cardíacas/sangre , Lesiones Cardíacas/diagnóstico , Infarto del Miocardio/sangre , Infarto del Miocardio/metabolismo , Anciano , Animales , Células Cultivadas , Seno Coronario/metabolismo , Femenino , Lesiones Cardíacas/metabolismo , Humanos , Isótopos , Cinética , Masculino , Persona de Mediana Edad , Infarto del Miocardio/diagnóstico , Miocitos Cardíacos/metabolismo , Ratas , Estándares de Referencia , Reproducibilidad de los Resultados , Factores de TiempoRESUMEN
BACKGROUND: Several years after sequencing the human genome and the mouse genome, much remains to be discovered about the functions of most human and mouse genes. Computational prediction of gene function promises to help focus limited experimental resources on the most likely hypotheses. Several algorithms using diverse genomic data have been applied to this task in model organisms; however, the performance of such approaches in mammals has not yet been evaluated. RESULTS: In this study, a standardized collection of mouse functional genomic data was assembled; nine bioinformatics teams used this data set to independently train classifiers and generate predictions of function, as defined by Gene Ontology (GO) terms, for 21,603 mouse genes; and the best performing submissions were combined in a single set of predictions. We identified strengths and weaknesses of current functional genomic data sets and compared the performance of function prediction algorithms. This analysis inferred functions for 76% of mouse genes, including 5,000 currently uncharacterized genes. At a recall rate of 20%, a unified set of predictions averaged 41% precision, with 26% of GO terms achieving a precision better than 90%. CONCLUSION: We performed a systematic evaluation of diverse, independently developed computational approaches for predicting gene function from heterogeneous data sources in mammals. The results show that currently available data for mammals allows predictions with both breadth and accuracy. Importantly, many highly novel predictions emerge for the 38% of mouse genes that remain uncharacterized.