RESUMEN
Widespread sequencing has yielded thousands of missense variants predicted or confirmed as disease causing. This creates a new bottleneck: determining the functional impact of each variant-typically a painstaking, customized process undertaken one or a few genes and variants at a time. Here, we established a high-throughput imaging platform to assay the impact of coding variation on protein localization, evaluating 3,448 missense variants of over 1,000 genes and phenotypes. We discovered that mislocalization is a common consequence of coding variation, affecting about one-sixth of all pathogenic missense variants, all cellular compartments, and recessive and dominant disorders alike. Mislocalization is primarily driven by effects on protein stability and membrane insertion rather than disruptions of trafficking signals or specific interactions. Furthermore, mislocalization patterns help explain pleiotropy and disease severity and provide insights on variants of uncertain significance. Our publicly available resource extends our understanding of coding variation in human diseases.
RESUMEN
While alternative splicing is known to diversify the functional characteristics of some genes, the extent to which protein isoforms globally contribute to functional complexity on a proteomic scale remains unknown. To address this systematically, we cloned full-length open reading frames of alternatively spliced transcripts for a large number of human genes and used protein-protein interaction profiling to functionally compare hundreds of protein isoform pairs. The majority of isoform pairs share less than 50% of their interactions. In the global context of interactome network maps, alternative isoforms tend to behave like distinct proteins rather than minor variants of each other. Interaction partners specific to alternative isoforms tend to be expressed in a highly tissue-specific manner and belong to distinct functional modules. Our strategy, applicable to other functional characteristics, reveals a widespread expansion of protein interaction capabilities through alternative splicing and suggests that many alternative "isoforms" are functionally divergent (i.e., "functional alloforms").
Asunto(s)
Empalme Alternativo , Isoformas de Proteínas/metabolismo , Proteoma/metabolismo , Animales , Clonación Molecular , Evolución Molecular , Humanos , Modelos Moleculares , Sistemas de Lectura Abierta , Dominios y Motivos de Interacción de Proteínas , Mapas de Interacción de Proteínas , Proteoma/análisisRESUMEN
How disease-associated mutations impair protein activities in the context of biological networks remains mostly undetermined. Although a few renowned alleles are well characterized, functional information is missing for over 100,000 disease-associated variants. Here we functionally profile several thousand missense mutations across a spectrum of Mendelian disorders using various interaction assays. The majority of disease-associated alleles exhibit wild-type chaperone binding profiles, suggesting they preserve protein folding or stability. While common variants from healthy individuals rarely affect interactions, two-thirds of disease-associated alleles perturb protein-protein interactions, with half corresponding to "edgetic" alleles affecting only a subset of interactions while leaving most other interactions unperturbed. With transcription factors, many alleles that leave protein-protein interactions intact affect DNA binding. Different mutations in the same gene leading to different interaction profiles often result in distinct disease phenotypes. Thus disease-associated alleles that perturb distinct protein activities rather than grossly affecting folding and stability are relatively widespread.
Asunto(s)
Enfermedad/genética , Mutación Missense , Mapas de Interacción de Proteínas , Proteínas/genética , Proteínas/metabolismo , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Estudio de Asociación del Genoma Completo , Humanos , Sistemas de Lectura Abierta , Pliegue de Proteína , Estabilidad ProteicaRESUMEN
Protein-protein interactions (PPIs) offer great opportunities to expand the druggable proteome and therapeutically tackle various diseases, but remain challenging targets for drug discovery. Here, we provide a comprehensive pipeline that combines experimental and computational tools to identify and validate PPI targets and perform early-stage drug discovery. We have developed a machine learning approach that prioritizes interactions by analyzing quantitative data from binary PPI assays or AlphaFold-Multimer predictions. Using the quantitative assay LuTHy together with our machine learning algorithm, we identified high-confidence interactions among SARS-CoV-2 proteins for which we predicted three-dimensional structures using AlphaFold-Multimer. We employed VirtualFlow to target the contact interface of the NSP10-NSP16 SARS-CoV-2 methyltransferase complex by ultra-large virtual drug screening. Thereby, we identified a compound that binds to NSP10 and inhibits its interaction with NSP16, while also disrupting the methyltransferase activity of the complex, and SARS-CoV-2 replication. Overall, this pipeline will help to prioritize PPI targets to accelerate the discovery of early-stage drug candidates targeting protein complexes and pathways.
Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/metabolismo , Metiltransferasas/metabolismo , Inteligencia Artificial , Descubrimiento de DrogasRESUMEN
Viral infections are known to hijack the transcription and translation of the host cell. However, the extent to which viral proteins coordinate these perturbations remains unclear. Here we used a model system, the human T-cell leukemia virus type 1 (HTLV-1), and systematically analyzed the transcriptome and interactome of key effectors oncoviral proteins Tax and HBZ. We showed that Tax and HBZ target distinct but also common transcription factors. Unexpectedly, we also uncovered a large set of interactions with RNA-binding proteins, including the U2 auxiliary factor large subunit (U2AF2), a key cellular regulator of pre-mRNA splicing. We discovered that Tax and HBZ perturb the splicing landscape by altering cassette exons in opposing manners, with Tax inducing exon inclusion while HBZ induces exon exclusion. Among Tax- and HBZ-dependent splicing changes, we identify events that are also altered in Adult T cell leukemia/lymphoma (ATLL) samples from two independent patient cohorts, and in well-known cancer census genes. Our interactome mapping approach, applicable to other viral oncogenes, has identified spliceosome perturbation as a novel mechanism coordinated by Tax and HBZ to reprogram the transcriptome.
Asunto(s)
Factores de Transcripción con Cremalleras de Leucina de Carácter Básico/metabolismo , Productos del Gen tax/metabolismo , Infecciones por HTLV-I/metabolismo , Leucemia-Linfoma de Células T del Adulto/virología , Proteínas de los Retroviridae/metabolismo , Células HEK293 , Infecciones por HTLV-I/etiología , Virus Linfotrópico T Tipo 1 Humano , Humanos , Células Jurkat , Empalme del ARN , ARN Mensajero , Factor de Empalme U2AF/metabolismoRESUMEN
Novel protein-coding genes can arise either through re-organization of pre-existing genes or de novo. Processes involving re-organization of pre-existing genes, notably after gene duplication, have been extensively described. In contrast, de novo gene birth remains poorly understood, mainly because translation of sequences devoid of genes, or 'non-genic' sequences, is expected to produce insignificant polypeptides rather than proteins with specific biological functions. Here we formalize an evolutionary model according to which functional genes evolve de novo through transitory proto-genes generated by widespread translational activity in non-genic sequences. Testing this model at the genome scale in Saccharomyces cerevisiae, we detect translation of hundreds of short species-specific open reading frames (ORFs) located in non-genic sequences. These translation events seem to provide adaptive potential, as suggested by their differential regulation upon stress and by signatures of retention by natural selection. In line with our model, we establish that S. cerevisiae ORFs can be placed within an evolutionary continuum ranging from non-genic sequences to genes. We identify ~1,900 candidate proto-genes among S. cerevisiae ORFs and find that de novo gene birth from such a reservoir may be more prevalent than sporadic gene duplication. Our work illustrates that evolution exploits seemingly dispensable sequences to generate adaptive functional innovation.
Asunto(s)
Evolución Molecular , Genes Fúngicos/genética , Saccharomyces/genética , Secuencia de Bases , Secuencia Conservada , Variación Genética , Datos de Secuencia Molecular , Sistemas de Lectura Abierta , Filogenia , Biosíntesis de Proteínas , Saccharomyces/clasificación , Saccharomyces cerevisiae/clasificación , Saccharomyces cerevisiae/genética , Alineación de SecuenciaRESUMEN
Genotypic differences greatly influence susceptibility and resistance to disease. Understanding genotype-phenotype relationships requires that phenotypes be viewed as manifestations of network properties, rather than simply as the result of individual genomic variations. Genome sequencing efforts have identified numerous germline mutations, and large numbers of somatic genomic alterations, associated with a predisposition to cancer. However, it remains difficult to distinguish background, or 'passenger', cancer mutations from causal, or 'driver', mutations in these data sets. Human viruses intrinsically depend on their host cell during the course of infection and can elicit pathological phenotypes similar to those arising from mutations. Here we test the hypothesis that genomic variations and tumour viruses may cause cancer through related mechanisms, by systematically examining host interactome and transcriptome network perturbations caused by DNA tumour virus proteins. The resulting integrated viral perturbation data reflects rewiring of the host cell networks, and highlights pathways, such as Notch signalling and apoptosis, that go awry in cancer. We show that systematic analyses of host targets of viral proteins can identify cancer genes with a success rate on a par with their identification through functional genomics and large-scale cataloguing of tumour mutations. Together, these complementary approaches increase the specificity of cancer gene identification. Combining systems-level studies of pathogen-encoded gene products with genomic approaches will facilitate the prioritization of cancer-causing driver genes to advance the understanding of the genetic basis of human cancer.
Asunto(s)
Genes Relacionados con las Neoplasias/genética , Genoma Humano/genética , Interacciones Huésped-Patógeno , Neoplasias/genética , Neoplasias/metabolismo , Virus Oncogénicos/patogenicidad , Proteínas Virales/metabolismo , Adenoviridae/genética , Adenoviridae/metabolismo , Adenoviridae/patogenicidad , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Herpesvirus Humano 4/genética , Herpesvirus Humano 4/metabolismo , Herpesvirus Humano 4/patogenicidad , Interacciones Huésped-Patógeno/genética , Humanos , Neoplasias/patología , Virus Oncogénicos/genética , Virus Oncogénicos/metabolismo , Sistemas de Lectura Abierta/genética , Papillomaviridae/genética , Papillomaviridae/metabolismo , Papillomaviridae/patogenicidad , Poliomavirus/genética , Poliomavirus/metabolismo , Poliomavirus/patogenicidad , Receptores Notch/metabolismo , Transducción de Señal , Técnicas del Sistema de Dos Híbridos , Proteínas Virales/genéticaRESUMEN
The Epstein-Barr virus (EBV) nuclear proteins EBNA3A, EBNA3B, and EBNA3C interact with the cell DNA binding protein RBPJ and regulate cell and viral genes. Repression of the CDKN2A tumor suppressor gene products p16(INK4A) and p14(ARF) by EBNA3A and EBNA3C is critical for EBV mediated transformation of resting B lymphocytes into immortalized lymphoblastoid cell lines (LCLs). To define the composition of endogenous EBNA3 protein complexes, we generated lymphoblastoid cell lines (LCLs) expressing flag-HA tagged EBNA3A, EBNA3B, or EBNA3C and used tandem affinity purification to isolate each EBNA3 complex. Our results demonstrated that each EBNA3 protein forms a distinct complex with RBPJ. Mass-spectrometry revealed that the EBNA3A and EBNA3B complexes also contained the deubquitylation complex consisting of WDR48, WDR20, and USP46 (or its paralog USP12) and that EBNA3C complexes contained WDR48. Immunoprecipitation confirmed that EBNA3A, EBNA3B, and EBNA3C association with the USP46 complex. Using chromatin immunoprecipitation, we demonstrate that WDR48 and USP46 are recruited to the p14(ARF) promoter in an EBNA3C dependent manner. Mapping studies were consistent with WDR48 being the primary mediator of EBNA3 association with the DUB complex. By ChIP assay, WDR48 was recruited to the p14(ARF) promoter in an EBNA3C dependent manner. Importantly, WDR48 associated with EBNA3A and EBNA3C domains that are critical for LCL growth, suggesting a role for USP46/USP12 in EBV induced growth transformation.
Asunto(s)
Transformación Celular Viral/genética , Endopeptidasas/metabolismo , Antígenos Nucleares del Virus de Epstein-Barr/metabolismo , Regulación Viral de la Expresión Génica/genética , Ubiquitina Tiolesterasa/metabolismo , Western Blotting , Línea Celular , Proliferación Celular , Inmunoprecipitación de Cromatina , Endopeptidasas/genética , Herpesvirus Humano 4/genética , Herpesvirus Humano 4/metabolismo , Humanos , Inmunoprecipitación , Espectrometría de Masas , Ubiquitina Tiolesterasa/genéticaRESUMEN
High-throughput binary protein interaction mapping is continuing to extend our understanding of cellular function and disease mechanisms. However, we remain one or two orders of magnitude away from a complete interaction map for humans and other major model organisms. Completion will require screening at substantially larger scales with many complementary assays, requiring further efficiency gains in proteome-scale interaction mapping. Here, we report Barcode Fusion Genetics-Yeast Two-Hybrid (BFG-Y2H), by which a full matrix of protein pairs can be screened in a single multiplexed strain pool. BFG-Y2H uses Cre recombination to fuse DNA barcodes from distinct plasmids, generating chimeric protein-pair barcodes that can be quantified via next-generation sequencing. We applied BFG-Y2H to four different matrices ranging in scale from ~25 K to 2.5 M protein pairs. The results show that BFG-Y2H increases the efficiency of protein matrix screening, with quality that is on par with state-of-the-art Y2H methods.
Asunto(s)
Centrosoma/metabolismo , Mapeo de Interacción de Proteínas/métodos , Proteoma/metabolismo , Saccharomyces cerevisiae/genética , Cromosomas Humanos/metabolismo , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Unión Proteica , Técnicas del Sistema de Dos HíbridosRESUMEN
In cellular systems, biophysical interactions between macromolecules underlie a complex web of functional interactions. How biophysical and functional networks are coordinated, whether all biophysical interactions correspond to functional interactions, and how such biophysical-versus-functional network coordination is shaped by evolutionary forces are all largely unanswered questions. Here, we investigate these questions using an "inter-interactome" approach. We systematically probed the yeast and human proteomes for interactions between proteins from these two species and functionally characterized the resulting inter-interactome network. After a billion years of evolutionary divergence, the yeast and human proteomes are still capable of forming a biophysical network with properties that resemble those of intra-species networks. Although substantially reduced relative to intra-species networks, the levels of functional overlap in the yeast-human inter-interactome network uncover significant remnants of co-functionality widely preserved in the two proteomes beyond human-yeast homologs. Our data support evolutionary selection against biophysical interactions between proteins with little or no co-functionality. Such non-functional interactions, however, represent a reservoir from which nascent functional interactions may arise.
Asunto(s)
Proteínas Fúngicas/metabolismo , Mapeo de Interacción de Proteínas/métodos , Proteoma/metabolismo , Biología Computacional/métodos , Bases de Datos de Proteínas , Evolución Molecular , HumanosRESUMEN
The small genome of polyomaviruses encodes a limited number of proteins that are highly dependent on interactions with host cell proteins for efficient viral replication. The SV40 large T antigen (LT) contains several discrete functional domains including the LXCXE or RB-binding motif, the DNA binding and helicase domains that contribute to the viral life cycle. In addition, the LT C-terminal region contains the host range and adenovirus helper functions required for lytic infection in certain restrictive cell types. To understand how LT affects the host cell to facilitate viral replication, we expressed full-length or functional domains of LT in cells, identified interacting host proteins and carried out expression profiling. LT perturbed the expression of p53 target genes and subsets of cell-cycle dependent genes regulated by the DREAM and the B-Myb-MuvB complexes. Affinity purification of LT followed by mass spectrometry revealed a specific interaction between the LT C-terminal region and FAM111A, a previously uncharacterized protein. Depletion of FAM111A recapitulated the effects of heterologous expression of the LT C-terminal region, including increased viral gene expression and lytic infection of SV40 host range mutants and adenovirus replication in restrictive cells. FAM111A functions as a host range restriction factor that is specifically targeted by SV40 LT.
Asunto(s)
Antígenos Transformadores de Poliomavirus/metabolismo , Especificidad del Huésped/genética , Receptores Virales/metabolismo , Virus 40 de los Simios/patogenicidad , Adenoviridae , Animales , Antígenos Transformadores de Poliomavirus/genética , Proteínas de Ciclo Celular/biosíntesis , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Línea Celular , Chlorocebus aethiops , Perfilación de la Expresión Génica , Humanos , Proteínas de Interacción con los Canales Kv/metabolismo , Estructura Terciaria de Proteína , Interferencia de ARN , ARN Interferente Pequeño , Receptores Virales/genética , Proteínas Represoras/metabolismo , Transactivadores/metabolismo , Proteína p53 Supresora de Tumor/biosíntesis , Proteína p53 Supresora de Tumor/genética , Replicación ViralRESUMEN
EBV nuclear antigen 2 (EBNA2) and EBV nuclear antigen LP (EBNALP) are critical for B-lymphocyte transformation to lymphoblastoid cell lines (LCLs). EBNA2 activates transcription through recombination signal-binding immunoglobulin κJ region (RBPJ), a transcription factor associated with NCoR repressive complexes, and EBNALP is implicated in repressor relocalization. EBNALP coactivation with EBNA2 was found to dominate over NCoR repression. EBNALP associated with NCoR and dismissed NCoR, NCoR and RBPJ, or NCoR, RBPJ, and EBNA2 from matrix-associated deacetylase (MAD) bodies. In non-EBV-infected BJAB B lymphoma cells that stably express EBNA2, EBNALP, or EBNA2 and EBNALP, EBNALP was associated with hairy and enhancer of split 1 (hes1), cd21, cd23, and arginine and glutamate-rich 1 (arglu1) enhancer or promoter DNA and was associated minimally with coding DNA. With the exception of RBPJ at the arglu1 enhancer, NCoR and RBPJ were significantly decreased at enhancer and promoter sites in EBNALP or EBNA2 and EBNALP BJAB cells. EBNA2 DNA association was unaffected by EBNALP, and EBNALP was unaffected by EBNA2. EBNA2 markedly increased RBPJ at enhancer sites without increasing NCoR. EBNALP further increased hes1 and arglu1 RNA levels with EBNA2 but did not further increase cd21 or cd23 RNA levels. EBNALP in which the 45 C-terminal residues critical for transformation and transcriptional activation were deleted associated with NCoR but was deficient in dismissing NCoR from MAD bodies and from enhancer and promoter sites. These data strongly support a model in which EBNA2 association with NCoR-deficient RBPJ enhances transcription and EBNALP dismisses NCoR and RBPJ repressive complexes from enhancers to coactivate hes1 and arglu1 but not cd21 or cd23.
Asunto(s)
Antígenos Nucleares del Virus de Epstein-Barr/metabolismo , Herpesvirus Humano 4/metabolismo , Proteína de Unión a la Señal Recombinante J de las Inmunoglobulinas/metabolismo , Co-Represor 1 de Receptor Nuclear/metabolismo , Co-Represor 2 de Receptor Nuclear/metabolismo , Proteínas Virales/metabolismo , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/genética , Línea Celular , Transformación Celular Viral/genética , ADN/genética , ADN/metabolismo , Elementos de Facilitación Genéticos , Antígenos Nucleares del Virus de Epstein-Barr/genética , Regulación Viral de la Expresión Génica , Herpesvirus Humano 4/genética , Herpesvirus Humano 4/patogenicidad , Proteínas de Homeodominio/genética , Humanos , Proteína de Unión a la Señal Recombinante J de las Inmunoglobulinas/genética , Modelos Biológicos , Complejos Multiproteicos/metabolismo , Co-Represor 1 de Receptor Nuclear/genética , Co-Represor 2 de Receptor Nuclear/genética , Regiones Promotoras Genéticas , Receptores de Complemento 3d/genética , Receptores de IgE/genética , Factor de Transcripción HES-1 , Proteínas Virales/genéticaRESUMEN
Most human Transcription factors (TFs) genes encode multiple protein isoforms differing in DNA binding domains, effector domains, or other protein regions. The global extent to which this results in functional differences between isoforms remains unknown. Here, we systematically compared 693 isoforms of 246 TF genes, assessing DNA binding, protein binding, transcriptional activation, subcellular localization, and condensate formation. Relative to reference isoforms, two-thirds of alternative TF isoforms exhibit differences in one or more molecular activities, which often could not be predicted from sequence. We observed two primary categories of alternative TF isoforms: "rewirers" and "negative regulators", both of which were associated with differentiation and cancer. Our results support a model wherein the relative expression levels of, and interactions involving, TF isoforms add an understudied layer of complexity to gene regulatory networks, demonstrating the importance of isoform-aware characterization of TF functions and providing a rich resource for further studies.
RESUMEN
The Epstein-Barr virus (EBV) lytic transactivator Rta activates promoters through direct binding to cognate DNA sites termed Rta response elements (RREs). Rta also activates promoters that apparently lack Rta binding sites, notably Zp and Rp. Chromatin immunoprecipitation (ChIP) of endogenous Rta expressed during early replication in B95-8 cells was performed to identify Rta binding sites in the EBV genome. Quantitative PCR (qPCR) analysis showed strong enrichment for known RREs but little or no enrichment for Rp or Zp, suggesting that the Rta ChIP approach enriches for direct Rta binding sites. Rta ChIP combined with deep sequencing (ChIP-seq) identified most known RREs and several novel Rta binding sites. Rta ChIP-seq peaks were frequently upstream of Rta-responsive genes, indicating that these Rta binding sites are likely functioning as RREs. Unexpectedly, the BALF5 promoter contained an Rta binding peak. To assess whether BALF5 might be activated by an RRE-dependent mechanism, an Rta mutant (Rta K156A), deficient for DNA binding and RRE activation but competent for Zp/Rp activation, was used. Rta K156A failed to activate BALF5p, suggesting this promoter can be activated by an RRE-dependent mechanism. Rta binding to late gene promoters was not seen at early time points but was specifically detected at later times within the Rta-responsive BLRF2 and BFRF3 promoters, even when DNA replication was inhibited. Our results represent the first characterization of Rta binding to the EBV genome during replication, identify previously unknown RREs, such as one in BALF5p, and highlight the complexity of EBV late gene promoter activation by Rta.
Asunto(s)
ADN Viral/metabolismo , Herpesvirus Humano 4/genética , Proteínas Inmediatas-Precoces/metabolismo , Transactivadores/metabolismo , Sitios de Unión , Línea Celular , Replicación del ADN , ADN Viral/química , Proteínas de Unión al ADN/genética , ADN Polimerasa Dirigida por ADN/genética , Genoma Viral , Herpesvirus Humano 4/metabolismo , Humanos , Proteínas Inmediatas-Precoces/química , Proteínas Inmediatas-Precoces/genética , Mutación , Regiones Promotoras Genéticas , Dominios y Motivos de Interacción de Proteínas/genética , Elementos de Respuesta , Transactivadores/química , Proteínas Virales/genética , Replicación Viral/genéticaRESUMEN
Many human diseases, arising from mutations of disease susceptibility genes (genetic diseases), are also associated with viral infections (virally implicated diseases), either in a directly causal manner or by indirect associations. Here we examine whether viral perturbations of host interactome may underlie such virally implicated disease relationships. Using as models two different human viruses, Epstein-Barr virus (EBV) and human papillomavirus (HPV), we find that host targets of viral proteins reside in network proximity to products of disease susceptibility genes. Expression changes in virally implicated disease tissues and comorbidity patterns cluster significantly in the network vicinity of viral targets. The topological proximity found between cellular targets of viral proteins and disease genes was exploited to uncover a novel pathway linking HPV to Fanconi anemia.
Asunto(s)
Enfermedad/etiología , Modelos Biológicos , Virosis/complicaciones , Biología Computacional , Enfermedad/genética , Anemia de Fanconi/etiología , Anemia de Fanconi/genética , Anemia de Fanconi/virología , Predisposición Genética a la Enfermedad , Herpesvirus Humano 4/metabolismo , Herpesvirus Humano 4/patogenicidad , Interacciones Huésped-Patógeno/genética , Interacciones Huésped-Patógeno/fisiología , Papillomavirus Humano 16/metabolismo , Papillomavirus Humano 16/patogenicidad , Humanos , Mapas de Interacción de Proteínas , Proteínas Virales/metabolismoRESUMEN
Alternative translation initiation and alternative splicing may give rise to N-terminal proteoforms, proteins that differ at their N-terminus compared with their canonical counterparts. Such proteoforms can have altered localizations, stabilities, and functions. Although proteoforms generated from splice variants can be engaged in different protein complexes, it remained to be studied to what extent this applies to N-terminal proteoforms. To address this, we mapped the interactomes of several pairs of N-terminal proteoforms and their canonical counterparts. First, we generated a catalogue of N-terminal proteoforms found in the HEK293T cellular cytosol from which 22 pairs were selected for interactome profiling. In addition, we provide evidence for the expression of several N-terminal proteoforms, identified in our catalogue, across different human tissues, as well as tissue-specific expression, highlighting their biological relevance. Protein-protein interaction profiling revealed that the overlap of the interactomes for both proteoforms is generally high, showing their functional relation. We also showed that N-terminal proteoforms can be engaged in new interactions and/or lose several interactions compared with their canonical counterparts, thus further expanding the functional diversity of proteomes.
Asunto(s)
Empalme Alternativo , Proteoma , Humanos , Células HEK293 , Empalme Alternativo/genética , CitosolRESUMEN
Protein-protein interactions (PPIs) offer great opportunities to expand the druggable proteome and therapeutically tackle various diseases, but remain challenging targets for drug discovery. Here, we provide a comprehensive pipeline that combines experimental and computational tools to identify and validate PPI targets and perform early-stage drug discovery. We have developed a machine learning approach that prioritizes interactions by analyzing quantitative data from binary PPI assays and AlphaFold-Multimer predictions. Using the quantitative assay LuTHy together with our machine learning algorithm, we identified high-confidence interactions among SARS-CoV-2 proteins for which we predicted three-dimensional structures using AlphaFold Multimer. We employed VirtualFlow to target the contact interface of the NSP10-NSP16 SARS-CoV-2 methyltransferase complex by ultra-large virtual drug screening. Thereby, we identified a compound that binds to NSP10 and inhibits its interaction with NSP16, while also disrupting the methyltransferase activity of the complex, and SARS-CoV-2 replication. Overall, this pipeline will help to prioritize PPI targets to accelerate the discovery of early-stage drug candidates targeting protein complexes and pathways.
RESUMEN
Comprehensive understanding of the human protein-protein interaction (PPI) network, aka the human interactome, can provide important insights into the molecular mechanisms of complex biological processes and diseases. Despite the remarkable experimental efforts undertaken to date to determine the structure of the human interactome, many PPIs remain unmapped. Computational approaches, especially network-based methods, can facilitate the identification of previously uncharacterized PPIs. Many such methods have been proposed. Yet, a systematic evaluation of existing network-based methods in predicting PPIs is still lacking. Here, we report community efforts initiated by the International Network Medicine Consortium to benchmark the ability of 26 representative network-based methods to predict PPIs across six different interactomes of four different organisms: A. thaliana, C. elegans, S. cerevisiae, and H. sapiens. Through extensive computational and experimental validations, we found that advanced similarity-based methods, which leverage the underlying network characteristics of PPIs, show superior performance over other general link prediction methods in the interactomes we considered.
Asunto(s)
Mapeo de Interacción de Proteínas , Saccharomyces cerevisiae , Animales , Humanos , Mapeo de Interacción de Proteínas/métodos , Caenorhabditis elegans , Mapas de Interacción de Proteínas , Biología Computacional/métodosRESUMEN
Generating reference maps of interactome networks illuminates genetic studies by providing a protein-centric approach to finding new components of existing pathways, complexes, and processes. We apply state-of-the-art methods to identify binary protein-protein interactions (PPIs) for Drosophila melanogaster. Four all-by-all yeast two-hybrid (Y2H) screens of > 10,000 Drosophila proteins result in the 'FlyBi' dataset of 8723 PPIs among 2939 proteins. Testing subsets of data from FlyBi and previous PPI studies using an orthogonal assay allows for normalization of data quality; subsequent integration of FlyBi and previous data results in an expanded binary Drosophila reference interaction network, DroRI, comprising 17,232 interactions among 6511 proteins. We use FlyBi data to generate an autophagy network, then validate in vivo using autophagy-related assays. The deformed wings (dwg) gene encodes a protein that is both a regulator and a target of autophagy. Altogether, these resources provide a foundation for building new hypotheses regarding protein networks and function.