RESUMO
Since 2010, the Human Proteome Project (HPP), the flagship initiative of the Human Proteome Organization (HUPO), has pursued two goals: (1) to credibly identify the protein parts list and (2) to make proteomics an integral part of multiomics studies of human health and disease. The HPP relies on international collaboration, data sharing, standardized reanalysis of MS data sets by PeptideAtlas and MassIVE-KB using HPP Guidelines for quality assurance, integration and curation of MS and non-MS protein data by neXtProt, plus extensive use of antibody profiling carried out by the Human Protein Atlas. According to the neXtProt release 2023-04-18, protein expression has now been credibly detected (PE1) for 18,397 of the 19,778 neXtProt predicted proteins coded in the human genome (93%). Of these PE1 proteins, 17,453 were detected with mass spectrometry (MS) in accordance with HPP Guidelines and 944 by a variety of non-MS methods. The number of neXtProt PE2, PE3, and PE4 missing proteins now stands at 1381. Achieving the unambiguous identification of 93% of predicted proteins encoded from across all chromosomes represents remarkable experimental progress on the Human Proteome parts list. Meanwhile, there are several categories of predicted proteins that have proved resistant to detection regardless of protein-based methods used. Additionally there are some PE1-4 proteins that probably should be reclassified to PE5, specifically 21 LINC entries and â¼30 HERV entries; these are being addressed in the present year. Applying proteomics in a wide array of biological and clinical studies ensures integration with other omics platforms as reported by the Biology and Disease-driven HPP teams and the antibody and pathology resource pillars. Current progress has positioned the HPP to transition to its Grand Challenge Project focused on determining the primary function(s) of every protein itself and in networks and pathways within the context of human health and disease.
Assuntos
Anticorpos , Proteoma , Humanos , Proteoma/genética , Proteoma/análise , Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Proteômica/métodosRESUMO
The 2022 Metrics of the Human Proteome from the HUPO Human Proteome Project (HPP) show that protein expression has now been credibly detected (neXtProt PE1 level) for 18â¯407 (93.2%) of the 19â¯750 predicted proteins coded in the human genome, a net gain of 50 since 2021 from data sets generated around the world and reanalyzed by the HPP. Conversely, the number of neXtProt PE2, PE3, and PE4 missing proteins has been reduced by 78 from 1421 to 1343. This represents continuing experimental progress on the human proteome parts list across all the chromosomes, as well as significant reclassifications. Meanwhile, applying proteomics in a vast array of biological and clinical studies continues to yield significant findings and growing integration with other omics platforms. We present highlights from the Chromosome-Centric HPP, Biology and Disease-driven HPP, and HPP Resource Pillars, compare features of mass spectrometry and Olink and Somalogic platforms, note the emergence of translation products from ribosome profiling of small open reading frames, and discuss the launch of the initial HPP Grand Challenge Project, "A Function for Each Protein".
Assuntos
Proteoma , Proteômica , Humanos , Proteoma/genética , Proteoma/análise , Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Fases de Leitura Aberta , Proteômica/métodosRESUMO
The 2021 Metrics of the HUPO Human Proteome Project (HPP) show that protein expression has now been credibly detected (neXtProt PE1 level) for 18â¯357 (92.8%) of the 19â¯778 predicted proteins coded in the human genome, a gain of 483 since 2020 from reports throughout the world reanalyzed by the HPP. Conversely, the number of neXtProt PE2, PE3, and PE4 missing proteins has been reduced by 478 to 1421. This represents remarkable progress on the proteome parts list. The utilization of proteomics in a broad array of biological and clinical studies likewise continues to expand with many important findings and effective integration with other omics platforms. We present highlights from the Immunopeptidomics, Glycoproteomics, Infectious Disease, Cardiovascular, Musculo-Skeletal, Liver, and Cancers B/D-HPP teams and from the Knowledgebase, Mass Spectrometry, Antibody Profiling, and Pathology resource pillars, as well as ethical considerations important to the clinical utilization of proteomics and protein biomarkers.
Assuntos
Benchmarking , Proteoma , Bases de Dados de Proteínas , Humanos , Espectrometria de Massas/métodos , Proteoma/análise , Proteoma/genética , Proteômica/métodosRESUMO
One of the main goals of the Chromosome-Centric Human Proteome Project (C-HPP) is detection of "missing proteins" (PE2-PE4). Using the UPS2 (Universal proteomics standard 2) set as a model to simulate the range of protein concentrations in the cell, we have previously shown that 2D fractionation enables the detection of more than 95% of UPS2 proteins in a complex biological mixture. In this study, we propose a novel experimental workflow for protein detection during the analysis of biological samples. This approach is extremely important in the context of the C-HPP and the neXt-MP50 Challenge, which can be solved by increasing the sensitivity and the coverage of the proteome encoded by a particular human chromosome. In this study, we used 2D fractionation for in-depth analysis of the proteins encoded by human chromosome 18 (Chr 18) in the HepG2 cell line. Use of 2D fractionation increased the sensitivity of the SRM SIS method by 1.3-fold (68 and 88 proteins were identified by 1D fractionation and 2D fractionation, respectively) and the shotgun MS/MS method by 2.5-fold (21 and 53 proteins encoded by Chr 18 were detected by 1D fractionation and 2D fractionation, respectively). The results of all experiments indicate that 111 proteins encoded by human Chr 18 have been identified; this list includes 42% of the Chr 18 protein-coding genes and 67% of the Chr 18 transcriptome species (Illumina RNaseq) in the HepG2 cell line obtained using a single sample. Corresponding mRNAs were not registered for 13 of the detected proteins. The combination of 2D fractionation technology with SRM SIS and shotgun mass spectrometric analysis did not achieve full coverage, i.e., identification of at least one protein product for each of the 265 protein-coding genes of the selected chromosome. To further increase the sensitivity of the method, we plan to use 5-10 crude synthetic peptides for each protein to identify the proteins and select one of the peptides based on the obtained mass spectra for the synthesis of an isotopically labeled standard for subsequent quantitative analysis. Data are available via ProteomeXchange with the identifier PXD019263.
Assuntos
Proteômica , Espectrometria de Massas em Tandem , Cromossomos Humanos , Humanos , Proteoma/genética , TranscriptomaRESUMO
The Chromosome-Centric Human Proteome Project (C-HPP) aims at the identification of missing proteins (MPs) and the functional characterization of functionally unannotated PE1 (uPE1) proteins. A major challenge in addressing this goal is that many human proteins and MPs are silent in adult cells. A promising approach to overcome such challenge is to exploit the advantage of novel tools such as pluripotent stem cells (PSCs), which are capable of differentiation into three embryonic germ layers, namely, the endoderm, mesoderm, and ectoderm. Here we present several examples of how the Human Y Chromosome Proteome Project (Y-HPP) benefited from this approach to meet C-HPP goals. Furthermore, we discuss how integrating CRISPR engineering, human-induced pluripotent stem cell (hiPSC)-derived disease modeling systems, and organoid technologies provides a unique platform for Y-HPP and C-HPP for MP identification and the functional characterization of human proteins, especially uPE1s.
Assuntos
Células-Tronco Pluripotentes , Proteoma , Diferenciação Celular , Cromossomos Humanos Y , Humanos , Proteoma/genéticaRESUMO
According to the 2020 Metrics of the HUPO Human Proteome Project (HPP), expression has now been detected at the protein level for >90% of the 19â¯773 predicted proteins coded in the human genome. The HPP annually reports on progress made throughout the world toward credibly identifying and characterizing the complete human protein parts list and promoting proteomics as an integral part of multiomics studies in medicine and the life sciences. NeXtProt release 2020-01 classified 17â¯874 proteins as PE1, having strong protein-level evidence, up 180 from 17â¯694 one year earlier. These represent 90.4% of the 19â¯773 predicted coding genes (all PE1,2,3,4 proteins in neXtProt). Conversely, the number of neXtProt PE2,3,4 proteins, termed the "missing proteins" (MPs), was reduced by 230 from 2129 to 1899 since the neXtProt 2019-01 release. PeptideAtlas is the primary source of uniform reanalysis of raw mass spectrometry data for neXtProt, supplemented this year with extensive data from MassIVE. PeptideAtlas 2020-01 added 362 canonical proteins between 2019 and 2020 and MassIVE contributed 84 more, many of which converted PE1 entries based on non-MS evidence to the MS-based subgroup. The 19 Biology and Disease-driven B/D-HPP teams continue to pursue the identification of driver proteins that underlie disease states, the characterization of regulatory mechanisms controlling the functions of these proteins, their proteoforms, and their interactions, and the progression of transitions from correlation to coexpression to causal networks after system perturbations. And the Human Protein Atlas published Blood, Brain, and Metabolic Atlases.
Assuntos
Proteoma , Proteômica , Bases de Dados de Proteínas , Genoma Humano , Humanos , Espectrometria de Massas , Proteoma/genéticaRESUMO
Using neXtProt release 2019-01-11, we manually curated a list of 1837 functionally uncharacterized human proteins. Using OrthoList 2, we found that 270 of them have homologues in Caenorhabditis elegans, including 60 with a one-to-one orthology relationship. According to annotations extracted from WormBase, the vast majority of these 60 worm genes have RNAi experimental data or mutant alleles, but manual inspection shows that only 15% have phenotypes that could be interpreted in terms of a specific function. One third of the worm orthologs have protein-protein interaction data, and two of these interactions are conserved in humans. The combination of phenotypic, protein-protein interaction, and gene expression data provides functional hypotheses for 8 uncharacterized human proteins. Experimental validation in human or orthologs is necessary before they can be considered for annotation.
Assuntos
Proteínas de Caenorhabditis elegans , Bases de Dados de Proteínas , Proteínas/metabolismo , Animais , Expressão Gênica , Humanos , Proteínas de Membrana/genética , Proteínas de Membrana/metabolismo , Camundongos , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Fenótipo , Mapas de Interação de Proteínas , Proteínas/genética , Interferência de RNA , Homologia de Sequência de AminoácidosRESUMO
When conducting proteomics experiments to detect missing proteins and protein isoforms in the human proteome, it is desirable to use a protease that can yield more unique peptides with properties amenable for mass spectrometry analysis. Though trypsin is currently the most widely used protease, some proteins can yield only a limited number of unique peptides by trypsin digestion. Other proteases and multiple proteases have been applied in reported studies to increase the number of identified proteins and protein sequence coverage. To facilitate the selection of proteases, we developed a web-based resource, called in silico Human Proteome Digestion Map (iHPDM), which contains a comprehensive proteolytic peptide database constructed from human proteins, including isoforms, in neXtProt digested by 15 protease combinations of one or two proteases. iHPDM provides convenient functions and graphical visualizations for users to examine and compare the digestion results of different proteases. Notably, it also supports users to input filtering criteria on digested peptides, e.g., peptide length and uniqueness, to select suitable proteases. iHPDM can facilitate protease selection for shotgun proteomics experiments to identify missing proteins, protein isoforms, and single amino acid variant peptides.
Assuntos
Peptídeo Hidrolases/metabolismo , Mapeamento de Peptídeos/métodos , Proteoma/metabolismo , Gráficos por Computador , Simulação por Computador , Visualização de Dados , Bases de Dados Factuais , Receptores ErbB/metabolismo , Humanos , Internet , MAP Quinase Quinase 1/metabolismo , N-Acetilexosaminiltransferases/metabolismo , Isoformas de Proteínas/metabolismo , Proteômica/métodos , Receptores Odorantes/metabolismo , Interface Usuário-Computador , gama-Glutamiltransferase/metabolismoRESUMO
The Human Proteome Organization's (HUPO) Human Proteome Project (HPP) developed Mass Spectrometry (MS) Data Interpretation Guidelines that have been applied since 2016. These guidelines have helped ensure that the emerging draft of the complete human proteome is highly accurate and with low numbers of false-positive protein identifications. Here, we describe an update to these guidelines based on consensus-reaching discussions with the wider HPP community over the past year. The revised 3.0 guidelines address several major and minor identified gaps. We have added guidelines for emerging data independent acquisition (DIA) MS workflows and for use of the new Universal Spectrum Identifier (USI) system being developed by the HUPO Proteomics Standards Initiative (PSI). In addition, we discuss updates to the standard HPP pipeline for collecting MS evidence for all proteins in the HPP, including refinements to minimum evidence. We present a new plan for incorporating MassIVE-KB into the HPP pipeline for the next (HPP 2020) cycle in order to obtain more comprehensive coverage of public MS data sets. The main checklist has been reorganized under headings and subitems, and related guidelines have been grouped. In sum, Version 2.1 of the HPP MS Data Interpretation Guidelines has served well, and this timely update to version 3.0 will aid the HPP as it approaches its goal of collecting and curating MS evidence of translation and expression for all predicted â¼20â¯000 human proteins encoded by the human genome.
Assuntos
Guias como Assunto , Espectrometria de Massas/métodos , Proteoma , Processamento de Sinais Assistido por Computador , Humanos , Proteômica , Sociedades CientíficasRESUMO
This work continues the series of the quantitative measurements of the proteins encoded by different chromosomes in the blood plasma of a healthy person. Selected Reaction Monitoring with Stable Isotope-labeled peptide Standards (SRM SIS) and a gene-centric approach, which is the basis for the implementation of the international Chromosome-centric Human Proteome Project (C-HPP), were applied for the quantitative measurement of proteins in human blood plasma. Analyses were carried out in the frame of C-HPP for each protein-coding gene of the four human chromosomes: 18, 13, Y, and mitochondrial. Concentrations of proteins encoded by 667 genes were measured in 54 blood plasma samples of the volunteers, whose health conditions were consistent with requirements for astronauts. The gene list included 276, 329, 47, and 15 genes of chromosomes 18, 13, Y, and the mitochondrial chromosome, respectively. This paper does not make claims about the detection of missing proteins. Only 205 proteins (30.7%) were detected in the samples. Of them, 84, 106, 10, and 5 belonged to chromosomes 18, 13, and Y and the mitochondrial chromosome, respectively. Each detected protein was found in at least one of the samples analyzed. The SRM SIS raw data are available in the ProteomeXchange repository (PXD004374, PASS01192).
Assuntos
Cromossomos Humanos/química , Plasma/química , Proteoma , Cromossomos Humanos/genética , Cromossomos Humanos Par 13/química , Cromossomos Humanos Par 18/química , Cromossomos Humanos Y/química , Bases de Dados de Proteínas , Voluntários Saudáveis , Humanos , Mitocôndrias/ultraestrutura , Proteoma/genéticaRESUMO
This manuscript collects all the efforts of the Russian Consortium, bottlenecks revealed in the course of the C-HPP realization, and ways of their overcoming. One of the main bottlenecks in the C-HPP is the insufficient sensitivity of proteomic technologies, hampering the detection of low- and ultralow-copy number proteins forming the "dark part" of the human proteome. In the frame of MP-Challenge, to increase proteome coverage we suggest an experimental workflow based on a combination of shotgun technology and selected reaction monitoring with two-dimensional alkaline fractionation. Further, to detect proteins that cannot be identified by such technologies, nanotechnologies such as combined atomic force microscopy with molecular fishing and/or nanowire detection may be useful. These technologies provide a powerful tool for single molecule analysis, by analogy with nanopore sequencing during genome analysis. To systematically analyze the functional features of some proteins (CP50 Challenge), we created a mathematical model that predicts the number of proteins differing in amino acid sequence: proteoforms. According to our data, we should expect about 100â¯000 different proteoforms in the liver tissue and a little more in the HepG2 cell line. The variety of proteins forming the whole human proteome significantly exceeds these results due to post-translational modifications (PTMs). As PTMs determine the functional specificity of the protein, we propose using a combination of gene-centric transcriptome-proteomic analysis with preliminary fractionation by two-dimensional electrophoresis to identify chemically modified proteoforms. Despite the complexity of the proposed solutions, such integrative approaches could be fruitful for MP50 and CP50 Challenges in the framework of the C-HPP.
Assuntos
Proteínas/análise , Proteoma , Proteômica/métodos , Técnicas Biossensoriais , Eletroforese em Gel Bidimensional , Genoma Humano , Humanos , Microscopia de Força Atômica/métodos , Nanotecnologia/métodos , Processamento de Proteína Pós-Traducional , Proteínas/isolamento & purificação , Federação Russa , Sensibilidade e Especificidade , Espectrometria de Massas por Ionização por Electrospray , Espectrometria de Massas em Tandem , Fluxo de TrabalhoRESUMO
The Human Proteome Project (HPP) annually reports on progress made throughout the field in credibly identifying and characterizing the complete human protein parts list and making proteomics an integral part of multiomics studies in medicine and the life sciences. NeXtProt release 2019-01-11 contains 17â¯694 proteins with strong protein-level evidence (PE1), compliant with HPP Guidelines for Interpretation of MS Data v2.1; these represent 89% of all 19â¯823 neXtProt predicted coding genes (all PE1,2,3,4 proteins), up from 17â¯470 one year earlier. Conversely, the number of neXtProt PE2,3,4 proteins, termed the "missing proteins" (MPs), has been reduced from 2949 to 2129 since 2016 through efforts throughout the community, including the chromosome-centric HPP. PeptideAtlas is the source of uniformly reanalyzed raw mass spectrometry data for neXtProt; PeptideAtlas added 495 canonical proteins between 2018 and 2019, especially from studies designed to detect hard-to-identify proteins. Meanwhile, the Human Protein Atlas has released version 18.1 with immunohistochemical evidence of expression of 17â¯000 proteins and survival plots as part of the Pathology Atlas. Many investigators apply multiplexed SRM-targeted proteomics for quantitation of organ-specific popular proteins in studies of various human diseases. The 19 teams of the Biology and Disease-driven B/D-HPP published a total of 160 publications in 2018, bringing proteomics to a broad array of biomedical research.
Assuntos
Bases de Dados de Proteínas , Proteínas/metabolismo , Proteoma , Cromossomos Humanos , Guias como Assunto , Humanos , Espectrometria de Massas , Proteínas/química , Proteínas/genética , Proteoma/genéticaRESUMO
INTRODUCTION: The technological and scientific progress performed in the Human Proteome Project (HPP) has provided to the scientific community a new set of experimental and bioinformatic methods in the challenging field of shotgun and SRM/MRM-based Proteomics. The requirements for a protein to be considered experimentally validated are now well-established, and the information about the human proteome is available in the neXtProt database, while targeted proteomic assays are stored in SRMAtlas. However, the study of the missing proteins continues being an outstanding issue. Areas covered: This review is focused on the implementation of proteogenomic methods designed to improve the detection and validation of the missing proteins. The evolution of the methodological strategies based on the combination of different omic technologies and the use of huge publicly available datasets is shown taking the Chromosome 16 Consortium as reference. Expert commentary: Proteogenomics and other strategies of data analysis implemented within the C-HPP initiative could be used as guidance to complete in a near future the catalog of the human proteins. Besides, in the next years, we will probably witness their use in the B/D-HPP initiative to go a step forward on the implications of the proteins in the human biology and disease.
Assuntos
Cromossomos Humanos Par 16/genética , Proteogenômica/tendências , Proteoma/genética , Proteômica , Bases de Dados de Proteínas , Projeto Genoma Humano , Humanos , Padrões de ReferênciaRESUMO
This report describes the 17th Chromosome-Centric Human Proteome Project which was held in Tehran, Iran, April 27 and 28, 2017. A brief summary of the symposium's talks including new technical and computational approaches for the identification of novel proteins from non-coding genomic regions, physicochemical and biological causes of missing proteins, and the close interactions between Chromosome- and Biology/Disease-driven Human Proteome Project are presented. A synopsis of decisions made on the prospective programs to maintain collaborative works, share resources and information, and establishment of a newly organized working group, the task force for missing protein analysis are discussed.
Assuntos
Cromossomos Humanos , Proteômica , Humanos , Análise de Sequência de ProteínaRESUMO
An important goal of the Human Proteome Organization (HUPO) Chromosome-centric Human Proteome Project (C-HPP) is to correctly define the number of canonical proteins encoded by their cognate open reading frames on each chromosome in the human genome. When identified with high confidence of protein evidence (PE), such proteins are termed PE1 proteins in the online database resource, neXtProt. However, proteins that have not been identified unequivocally at the protein level but that have other evidence suggestive of their existence (PE2-4) are termed missing proteins (MPs). The number of MPs has been reduced from 5511 in 2012 to 2186 in 2018 (neXtProt 2018-01-17 release). Although the annotation of the human proteome has made significant progress, the "parts list" alone does not inform function. Indeed, 1937 proteins representing â¼10% of the human proteome have no function either annotated from experimental characterization or predicted by homology to other proteins. Specifically, these 1937 "dark proteins" of the so-called dark proteome are composed of 1260 functionally uncharacterized but identified PE1 proteins, designated as uPE1, plus 677 MPs from categories PE2-PE4, which also have no known or predicted function and are termed uMPs. At the HUPO-2017 Annual Meeting, the C-HPP officially adopted the uPE1 pilot initiative, with 14 participating international teams later committing to demonstrate the feasibility of the functional characterization of large numbers of dark proteins (CP), starting first with 50 uPE1 proteins, in a stepwise chromosome-centric organizational manner. The second aim of the feasibility phase to characterize protein (CP) functions of 50 uPE1 proteins, termed the neXt-CP50 initiative, is to utilize a variety of approaches and workflows according to individual team expertise, interest, and resources so as to enable the C-HPP to recommend experimentally proven workflows to the proteome community within 3 years. The results from this pilot will not only be the cornerstone of a larger characterization initiative but also enhance understanding of the human proteome and integrated cellular networks for the discovery of new mechanisms of pathology, mechanistically informative biomarkers, and rational drug targets.
Assuntos
Cromossomos Humanos/genética , Bases de Dados de Proteínas , Proteoma/análise , Genoma Humano , Humanos , Espectrometria de Massas , Anotação de Sequência Molecular , Fases de Leitura Aberta , Projetos Piloto , Proteoma/genéticaRESUMO
The Human Proteome Project (HPP) annually reports on progress throughout the field in credibly identifying and characterizing the human protein parts list and making proteomics an integral part of multiomics studies in medicine and the life sciences. NeXtProt release 2018-01-17, the baseline for this sixth annual HPP special issue of the Journal of Proteome Research, contains 17â¯470 PE1 proteins, 89% of all neXtProt predicted PE1-4 proteins, up from 17â¯008 in release 2017-01-23 and 13â¯975 in release 2012-02-24. Conversely, the number of neXtProt PE2,3,4 missing proteins has been reduced from 2949 to 2579 to 2186 over the past two years. Of the PE1 proteins, 16â¯092 are based on mass spectrometry results, and 1378 on other kinds of protein studies, notably protein-protein interaction findings. PeptideAtlas has 15â¯798 canonical proteins, up 625 over the past year, including 269 from SUMOylation studies. The largest reason for missing proteins is low abundance. Meanwhile, the Human Protein Atlas has released its Cell Atlas, Pathology Atlas, and updated Tissue Atlas, and is applying recommendations from the International Working Group on Antibody Validation. Finally, there is progress using the quantitative multiplex organ-specific popular proteins targeted proteomics approach in various disease categories.
Assuntos
Bases de Dados de Proteínas/tendências , Proteoma/análise , Proteômica/métodos , Guias como Assunto , Humanos , Espectrometria de Massas/métodos , Mapas de Interação de Proteínas , Projetos de Pesquisa , SoftwareRESUMO
The Chromosome-centric Human Proteome Project (C-HPP), announced in September 2016, is an initiative to accelerate progress on the detection and characterization of neXtProt PE2,3,4 "missing proteins" (MPs) with a mandate to each chromosome team to find about 50 MPs over 2 years. Here we report major progress toward the neXt-MP50 challenge with 43 newly validated Chr 17 PE1 proteins, of which 25 were based on mass spectrometry, 12 on protein-protein interactions, 3 on a combination of MS and PPI, and 3 with other types of data. Notable among these new PE1 proteins were five keratin-associated proteins, a single olfactory receptor, and five additional membrane-embedded proteins. We evaluate the prospects of finding the remaining 105 MPs coded for on Chr 17, focusing on mass spectrometry and protein-protein interaction approaches. We present a list of 35 prioritized MPs with specific approaches that may be used in further MS and PPI experimental studies. Additionally, we demonstrate how in silico studies can be used to capture individual peptides from major data repositories, documenting one MP that appears to be a strong candidate for PE1. We are close to our goal of finding 50 MPs for Chr 17.
Assuntos
Cromossomos Humanos Par 17/química , Proteoma/análise , Simulação por Computador , Humanos , Espectrometria de Massas , Métodos , Mapas de Interação de Proteínas , Proteínas/análiseRESUMO
The Human Proteome Project (HPP) aims deciphering the complete map of the human proteome. In the past few years, significant efforts of the HPP teams have been dedicated to the experimental detection of the missing proteins, which lack reliable mass spectrometry evidence of their existence. In this endeavor, an in depth analysis of shotgun experiments might represent a valuable resource to select a biological matrix in design validation experiments. In this work, we used all the proteomic experiments from the NCI60 cell lines and applied an integrative approach based on the results obtained from Comet, Mascot, OMSSA, and X!Tandem. This workflow benefits from the complementarity of these search engines to increase the proteome coverage. Five missing proteins C-HPP guidelines compliant were identified, although further validation is needed. Moreover, 165 missing proteins were detected with only one unique peptide, and their functional analysis supported their participation in cellular pathways as was also proposed in other studies. Finally, we performed a combined analysis of the gene expression levels and the proteomic identifications from the common cell lines between the NCI60 and the CCLE project to suggest alternatives for further validation of missing protein observations.
Assuntos
Proteoma/análise , Proteômica/métodos , Ferramenta de Busca , Linhagem Celular Tumoral , Humanos , Bases de Conhecimento , Proteínas/análise , SoftwareRESUMO
One of the major goals of the Chromosome-Centric Human Proteome Project (C-HPP) is to fill the knowledge gaps between human genomic information and the corresponding proteomic information. These gaps are due to "missing" proteins (MPs)-predicted proteins with insufficient evidence from mass spectrometry (MS), biochemical, structural, or antibody analyses-that currently account for 2579 of the 19587 predicted human proteins (neXtProt, 2017-01). We address some of the lessons learned from the inconsistent annotations of missing proteins in databases (DB) and demonstrate a systematic proteogenomic approach designed to explore a potential new function of a known protein. To illustrate a cautious and strategic approach for characterization of novel function in vitro and in vivo, we present the case of Na(+)/H(+) exchange regulatory cofactor 1 (NHERF1/SLC9A3R1, located at chromosome 17q25.1; hereafter NHERF1), which was mistakenly labeled as an MP in one DB (Global Proteome Machine Database; GPMDB, 2011-09 release) but was well known in another public DB and in the literature. As a first step, NHERF1 was determined by MS and immunoblotting for its molecular identity. We next investigated the potential new function of NHERF1 by carrying out the quantitative MS profiling of placental trophoblasts (PXD004723) and functional study of cytotrophoblast JEG-3 cells. We found that NHERF1 was associated with trophoblast differentiation and motility. To validate this newly found cellular function of NHERF1, we used the Caenorhabditis elegans mutant of nrfl-1 (a nematode ortholog of NHERF1), which exhibits a protruding vulva (Pvl) and egg-laying-defective phenotype, and performed genetic complementation work. The nrfl-1 mutant was almost fully rescued by the transfection of the recombinant transgenic construct that contained human NHERF1. These results suggest that NHERF1 could have a previously unknown function in pregnancy and in the development of human embryos. Our study outlines a stepwise experimental platform to explore new functions of ambiguously denoted candidate proteins and scrutinizes the mandated DB search for the selection of MPs to study in the future.
Assuntos
Fosfoproteínas/fisiologia , Proteogenômica/métodos , Trocadores de Sódio-Hidrogênio/fisiologia , Animais , Caenorhabditis elegans/genética , Diferenciação Celular , Movimento Celular , Bases de Dados de Proteínas , Feminino , Humanos , Immunoblotting , Espectrometria de Massas , Reprodução , Transgenes , Trofoblastos/citologiaRESUMO
Preeclampsia (PE) is a placenta disease, featured by hypertension, proteinuria, and other multiorgan dysfunctions, and its etiology is unclear. We and others have shown that intensive endoplasmic reticulum (ER) stress and unfolded protein response (UPR) occur in the PE placenta. In this study, we isolated detergent-insoluble proteins (DIPs) from human placenta tissues, which were enriched with protein aggregates, to characterize the placenta UPR in PE. With data-independent acquisition (DIA) mass spectrometry, we identified 2066 DIPs across all normal (n = 10) and PE (n = 10) placenta samples, among which 110 and 108 DIPs were significantly up- and down-regulated in PE, respectively. Per clustering analysis, differential DIPs could generally distinguish PE from normal placentas. We verified the MS quantitation of endoglin and vimentin by immunoblotting. In addition, we observed that PE placenta tissues have remarkably more endoglin in the cytoplasm. Furthermore, we found that DIPs were evenly distributed across different chromosomes and could be enriched in diversified gene ontology terms, while differential DIPs avoided to distribute on X-chromosome. Significantly up-regulated DIPs in PE were focused on the top functions of lipid metabolism, while 23 of these DIPs could form the top network regulating cellular movement, development, growth, and proliferation. Our results implicate that human PE placentas have disease-relevant differential DIPs, which reflect aberrantly aggregated proteins of placental tissues. The mass spectrometry proteomics data have been deposited to ProteomeXchange consortium with the data set identifier PXD006654, and iProX database (accession number: IPX0000948000).