RESUMO
Widespread sequencing has yielded thousands of missense variants predicted or confirmed as disease causing. This creates a new bottleneck: determining the functional impact of each variant-typically a painstaking, customized process undertaken one or a few genes and variants at a time. Here, we established a high-throughput imaging platform to assay the impact of coding variation on protein localization, evaluating 3,448 missense variants of over 1,000 genes and phenotypes. We discovered that mislocalization is a common consequence of coding variation, affecting about one-sixth of all pathogenic missense variants, all cellular compartments, and recessive and dominant disorders alike. Mislocalization is primarily driven by effects on protein stability and membrane insertion rather than disruptions of trafficking signals or specific interactions. Furthermore, mislocalization patterns help explain pleiotropy and disease severity and provide insights on variants of uncertain significance. Our publicly available resource extends our understanding of coding variation in human diseases.
RESUMO
While alternative splicing is known to diversify the functional characteristics of some genes, the extent to which protein isoforms globally contribute to functional complexity on a proteomic scale remains unknown. To address this systematically, we cloned full-length open reading frames of alternatively spliced transcripts for a large number of human genes and used protein-protein interaction profiling to functionally compare hundreds of protein isoform pairs. The majority of isoform pairs share less than 50% of their interactions. In the global context of interactome network maps, alternative isoforms tend to behave like distinct proteins rather than minor variants of each other. Interaction partners specific to alternative isoforms tend to be expressed in a highly tissue-specific manner and belong to distinct functional modules. Our strategy, applicable to other functional characteristics, reveals a widespread expansion of protein interaction capabilities through alternative splicing and suggests that many alternative "isoforms" are functionally divergent (i.e., "functional alloforms").
Assuntos
Processamento Alternativo , Isoformas de Proteínas/metabolismo , Proteoma/metabolismo , Animais , Clonagem Molecular , Evolução Molecular , Humanos , Modelos Moleculares , Fases de Leitura Aberta , Domínios e Motivos de Interação entre Proteínas , Mapas de Interação de Proteínas , Proteoma/análiseRESUMO
Just as reference genome sequences revolutionized human genetics, reference maps of interactome networks will be critical to fully understand genotype-phenotype relationships. Here, we describe a systematic map of ?14,000 high-quality human binary protein-protein interactions. At equal quality, this map is ?30% larger than what is available from small-scale studies published in the literature in the last few decades. While currently available information is highly biased and only covers a relatively small portion of the proteome, our systematic map appears strikingly more homogeneous, revealing a "broader" human interactome network than currently appreciated. The map also uncovers significant interconnectivity between known and candidate cancer gene products, providing unbiased evidence for an expanded functional cancer landscape, while demonstrating how high-quality interactome models will help "connect the dots" of the genomic revolution.
Assuntos
Mapas de Interação de Proteínas , Proteoma/metabolismo , Animais , Bases de Dados de Proteínas , Estudo de Associação Genômica Ampla , Humanos , Camundongos , Neoplasias/metabolismoRESUMO
In cellular systems, biophysical interactions between macromolecules underlie a complex web of functional interactions. How biophysical and functional networks are coordinated, whether all biophysical interactions correspond to functional interactions, and how such biophysical-versus-functional network coordination is shaped by evolutionary forces are all largely unanswered questions. Here, we investigate these questions using an "inter-interactome" approach. We systematically probed the yeast and human proteomes for interactions between proteins from these two species and functionally characterized the resulting inter-interactome network. After a billion years of evolutionary divergence, the yeast and human proteomes are still capable of forming a biophysical network with properties that resemble those of intra-species networks. Although substantially reduced relative to intra-species networks, the levels of functional overlap in the yeast-human inter-interactome network uncover significant remnants of co-functionality widely preserved in the two proteomes beyond human-yeast homologs. Our data support evolutionary selection against biophysical interactions between proteins with little or no co-functionality. Such non-functional interactions, however, represent a reservoir from which nascent functional interactions may arise.
Assuntos
Proteínas Fúngicas/metabolismo , Mapeamento de Interação de Proteínas/métodos , Proteoma/metabolismo , Biologia Computacional/métodos , Bases de Dados de Proteínas , Evolução Molecular , HumanosRESUMO
Oncogenic mutations in the serine/threonine kinase B-RAF (also known as BRAF) are found in 50-70% of malignant melanomas. Pre-clinical studies have demonstrated that the B-RAF(V600E) mutation predicts a dependency on the mitogen-activated protein kinase (MAPK) signalling cascade in melanoma-an observation that has been validated by the success of RAF and MEK inhibitors in clinical trials. However, clinical responses to targeted anticancer therapeutics are frequently confounded by de novo or acquired resistance. Identification of resistance mechanisms in a manner that elucidates alternative 'druggable' targets may inform effective long-term treatment strategies. Here we expressed â¼600 kinase and kinase-related open reading frames (ORFs) in parallel to interrogate resistance to a selective RAF kinase inhibitor. We identified MAP3K8 (the gene encoding COT/Tpl2) as a MAPK pathway agonist that drives resistance to RAF inhibition in B-RAF(V600E) cell lines. COT activates ERK primarily through MEK-dependent mechanisms that do not require RAF signalling. Moreover, COT expression is associated with de novo resistance in B-RAF(V600E) cultured cell lines and acquired resistance in melanoma cells and tissue obtained from relapsing patients following treatment with MEK or RAF inhibitors. We further identify combinatorial MAPK pathway inhibition or targeting of COT kinase activity as possible therapeutic strategies for reducing MAPK pathway activation in this setting. Together, these results provide new insights into resistance mechanisms involving the MAPK pathway and articulate an integrative approach through which high-throughput functional screens may inform the development of novel therapeutic strategies.
Assuntos
Resistencia a Medicamentos Antineoplásicos , MAP Quinase Quinase Quinases/metabolismo , Sistema de Sinalização das MAP Quinases , Proteínas Quinases Ativadas por Mitógeno/metabolismo , Proteínas Proto-Oncogênicas B-raf/antagonistas & inibidores , Proteínas Proto-Oncogênicas/metabolismo , Regulação Alostérica , Linhagem Celular Tumoral , Ensaios Clínicos como Assunto , Resistencia a Medicamentos Antineoplásicos/efeitos dos fármacos , Resistencia a Medicamentos Antineoplásicos/genética , Ativação Enzimática/efeitos dos fármacos , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Biblioteca Gênica , Humanos , Indóis/farmacologia , Indóis/uso terapêutico , MAP Quinase Quinase Quinases/genética , Melanoma/tratamento farmacológico , Melanoma/enzimologia , Melanoma/genética , Melanoma/metabolismo , Quinases de Proteína Quinase Ativadas por Mitógeno/antagonistas & inibidores , Quinases de Proteína Quinase Ativadas por Mitógeno/metabolismo , Fases de Leitura Aberta/genética , Inibidores de Proteínas Quinases/farmacologia , Inibidores de Proteínas Quinases/uso terapêutico , Proteínas Proto-Oncogênicas/genética , Proteínas Proto-Oncogênicas B-raf/química , Proteínas Proto-Oncogênicas B-raf/genética , Proteínas Proto-Oncogênicas B-raf/metabolismo , Proteínas Proto-Oncogênicas c-raf/genética , Proteínas Proto-Oncogênicas c-raf/metabolismo , Sulfonamidas/farmacologia , Sulfonamidas/uso terapêutico , VemurafenibRESUMO
Functional characterization of the human genome requires tools for systematically modulating gene expression in both loss-of-function and gain-of-function experiments. We describe the production of a sequence-confirmed, clonal collection of over 16,100 human open-reading frames (ORFs) encoded in a versatile Gateway vector system. Using this ORFeome resource, we created a genome-scale expression collection in a lentiviral vector, thereby enabling both targeted experiments and high-throughput screens in diverse cell types.
Assuntos
Clonagem Molecular/métodos , Vetores Genéticos/genética , Biblioteca Genômica , Lentivirus/genética , Humanos , Fases de Leitura AbertaRESUMO
Although a highly accurate sequence of the Caenorhabditis elegans genome has been available for 10 years, the exact transcript structures of many of its protein-coding genes remain unsettled. Approximately two-thirds of the ORFeome has been verified reactively by amplifying and cloning computationally predicted transcript models; still a full third of the ORFeome remains experimentally unverified. To fully identify the protein-coding potential of the worm genome including transcripts that may not satisfy existing heuristics for gene prediction, we developed a computational and experimental platform adapting rapid amplification of cDNA ends (RACE) for large-scale structural transcript annotation. We interrogated 2000 unverified protein-coding genes using this platform. We obtained RACE data for approximately two-thirds of the examined transcripts and reconstructed ORF and transcript models for close to 1000 of these. We defined untranslated regions, identified new exons, and redefined previously annotated exons. Our results show that as much as 20% of the C. elegans genome may be incorrectly annotated. Many annotation errors could be corrected proactively with our large-scale RACE platform.
Assuntos
Proteínas de Caenorhabditis elegans/metabolismo , Caenorhabditis elegans/genética , Biologia Computacional/métodos , DNA Complementar/genética , Perfilação da Expressão Gênica , Fases de Leitura Aberta/genética , Animais , Caenorhabditis elegans/metabolismo , Proteínas de Caenorhabditis elegans/genética , Clonagem Molecular , Primers do DNA , DNA de Helmintos/análise , DNA de Helmintos/genética , Éxons , Genes de Helmintos , Análise de Sequência de DNA , Transcrição GênicaRESUMO
With sequencing of thousands of organisms completed or in progress, there is a growing need to integrate gene prediction with metabolic network analysis. Using Chlamydomonas reinhardtii as a model, we describe a systems-level methodology bridging metabolic network reconstruction with experimental verification of enzyme encoding open reading frames. Our quantitative and predictive metabolic model and its associated cloned open reading frames provide useful resources for metabolic engineering.
Assuntos
Chlamydomonas reinhardtii/metabolismo , Biologia Computacional/métodos , Genoma de Protozoário , Modelos Genéticos , Proteínas de Protozoários/metabolismo , Transcrição Gênica , Animais , Chlamydomonas reinhardtii/enzimologia , Chlamydomonas reinhardtii/genética , Simulação por Computador , Enzimas/genética , Enzimas/metabolismo , Engenharia Genética , Proteínas de Protozoários/genéticaRESUMO
Information on protein-protein interactions is of central importance for many areas of biomedical research. At present no method exists to systematically and experimentally assess the quality of individual interactions reported in interaction mapping experiments. To provide a standardized confidence-scoring method that can be applied to tens of thousands of protein interactions, we have developed an interaction tool kit consisting of four complementary, high-throughput protein interaction assays. We benchmarked these assays against positive and random reference sets consisting of well documented pairs of interacting human proteins and randomly chosen protein pairs, respectively. A logistic regression model was trained using the data from these reference sets to combine the assay outputs and calculate the probability that any newly identified interaction pair is a true biophysical interaction once it has been tested in the tool kit. This general approach will allow a systematic and empirical assignment of confidence scores to all individual protein-protein interactions in interactome networks.
Assuntos
Mapeamento de Interação de Proteínas/métodos , Proteínas/análise , Proteínas/metabolismo , Animais , Humanos , Ligação Proteica , Sensibilidade e EspecificidadeRESUMO
Several attempts have been made to systematically map protein-protein interaction, or 'interactome', networks. However, it remains difficult to assess the quality and coverage of existing data sets. Here we describe a framework that uses an empirically-based approach to rigorously dissect quality parameters of currently available human interactome maps. Our results indicate that high-throughput yeast two-hybrid (HT-Y2H) interactions for human proteins are more precise than literature-curated interactions supported by a single publication, suggesting that HT-Y2H is suitable to map a significant portion of the human interactome. We estimate that the human interactome contains approximately 130,000 binary interactions, most of which remain to be mapped. Similar to estimates of DNA sequence data quality and genome size early in the Human Genome Project, estimates of protein interaction data quality and interactome size are crucial to establish the magnitude of the task of comprehensive human interactome mapping and to elucidate a path toward this goal.
Assuntos
Mapeamento de Interação de Proteínas/métodos , Proteínas/análise , Proteínas/metabolismo , Bases de Dados de Proteínas , Humanos , Ligação Proteica , Proteínas/genética , Sensibilidade e EspecificidadeRESUMO
Describing the 'ORFeome' of an organism, including all major isoforms, is essential for a system-level understanding of any species; however, conventional cloning and sequencing approaches are prohibitively costly and labor-intensive. We describe a potentially genome-wide methodology for efficiently capturing new coding isoforms using reverse transcriptase (RT)-PCR recombinational cloning, 'deep-well' pooling and a next-generation sequencing platform. This ORFeome discovery pipeline will be applicable to any eukaryotic species with a sequenced genome.
Assuntos
Clonagem Molecular/métodos , Isoformas de Proteínas/genética , Análise de Sequência/métodos , Processamento Alternativo , Animais , DNA Complementar/genética , Etiquetas de Sequências Expressas , Feminino , Genômica/métodos , Humanos , Masculino , Fases de Leitura Aberta , Gravidez , RNA/genética , Reação em Cadeia da Polimerase Via Transcriptase ReversaRESUMO
Rapid amplification of cDNA ends (RACE) is a widely used approach for transcript identification. Random clone selection from the RACE mixture, however, is an ineffective sampling strategy if the dynamic range of transcript abundances is large. To improve sampling efficiency of human transcripts, we hybridized the products of the RACE reaction onto tiling arrays and used the detected exons to delineate a series of reverse-transcriptase (RT)-PCRs, through which the original RACE transcript population was segregated into simpler transcript populations. We independently cloned the products and sequenced randomly selected clones. This approach, RACEarray, is superior to direct cloning and sequencing of RACE products because it specifically targets new transcripts and often results in overall normalization of transcript abundance. We show theoretically and experimentally that this strategy leads indeed to efficient sampling of new transcripts, and we investigated multiplexing the strategy by pooling RACE reactions from multiple interrogated loci before hybridization.
Assuntos
DNA Complementar/genética , Perfilação da Expressão Gênica/métodos , Biblioteca Gênica , Técnicas de Amplificação de Ácido Nucleico/métodos , RNA/genética , Processamento Alternativo , Cromossomos Humanos Par 21/genética , Cromossomos Humanos Par 22/genética , Clonagem Molecular , Éxons , Genoma Humano , Humanos , Dados de Sequência Molecular , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Isoformas de Proteínas/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Transcrição GênicaRESUMO
The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5' and 3' transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network.
Assuntos
Células/metabolismo , Redes Reguladoras de Genes/fisiologia , RNA/fisiologia , Transcriptoma/fisiologia , Algoritmos , Proteínas Quimerinas/química , Proteínas Quimerinas/genética , Cromossomos Humanos Par 1/genética , Feminino , Perfilação da Expressão Gênica , Redes Reguladoras de Genes/genética , Humanos , Masculino , Análise em Microsséries/métodos , Modelos Biológicos , Técnicas de Amplificação de Ácido Nucleico/métodos , RNA/genética , Isoformas de RNA/química , Isoformas de RNA/genética , Isoformas de RNA/metabolismo , Transcrição Gênica/genética , Estudos de Validação como AssuntoRESUMO
Current yeast interactome network maps contain several hundred molecular complexes with limited and somewhat controversial representation of direct binary interactions. We carried out a comparative quality assessment of current yeast interactome data sets, demonstrating that high-throughput yeast two-hybrid (Y2H) screening provides high-quality binary interaction information. Because a large fraction of the yeast binary interactome remains to be mapped, we developed an empirically controlled mapping framework to produce a "second-generation" high-quality, high-throughput Y2H data set covering approximately 20% of all yeast binary interactions. Both Y2H and affinity purification followed by mass spectrometry (AP/MS) data are of equally high quality but of a fundamentally different and complementary nature, resulting in networks with different topological and biological properties. Compared to co-complex interactome models, this binary map is enriched for transient signaling interactions and intercomplex connections with a highly significant clustering between essential proteins. Rather than correlating with essentiality, protein connectivity correlates with genetic pleiotropy.