RESUMEN
Similar to the reversal of kinase-mediated protein phosphorylation by phosphatases, deubiquitinating enzymes (DUBs) oppose the action of E3 ubiquitin ligases and reverse the ubiquitination of proteins. A total of 99 human DUBs, classified in 7 families, allow in this way for a precise control of cellular function and homeostasis. Ubiquitination regulates a myriad of cellular processes, and is altered in many pathological conditions. Thus, ubiquitination-regulating enzymes are increasingly regarded as potential candidates for therapeutic intervention. In this context, given the predicted easier pharmacological control of DUBs relative to E3 ligases, a significant effort is now being directed to better understand the processes and substrates regulated by each DUB. Classical studies have identified specific DUB substrate candidates by traditional molecular biology techniques in a case-by-case manner. Lately, single experiments can identify thousands of ubiquitinated proteins at a specific cellular context and narrow down which of those are regulated by a given DUB, thanks to the development of new strategies to isolate and enrich ubiquitinated material and to improvements in mass spectrometry detection capabilities. Here we present an overview of both types of studies, discussing the criteria that, in our view, need to be fulfilled for a protein to be considered as a high-confidence substrate of a given DUB. Applying these criteria, we have manually reviewed the relevant literature currently available in a systematic manner, and identified 650 high-confidence substrates of human DUBs. We make this information easily accessible to the research community through an updated version of the DUBase website (https://ehubio.ehu.eus/dubase/). Finally, in order to illustrate how this information can contribute to a better understanding of the physiopathological role of DUBs, we place a special emphasis on a subset of these enzymes that have been associated with neurodevelopmental disorders.
Asunto(s)
Trastornos del Neurodesarrollo , Ubiquitina , Humanos , Ubiquitinación , Ubiquitina/metabolismo , Ubiquitina-Proteína Ligasas/metabolismo , Enzimas Desubicuitinizantes/metabolismoRESUMEN
The human genome contains nearly 100 deubiquitinating enzymes (DUBs) responsible for removing ubiquitin moieties from a large variety of substrates. Which DUBs are responsible for targeting which substrates remain mostly unknown. Here we implement the bioUb approach to identify DUB substrates in a systematic manner, combining gene silencing and proteomics analyses. Silencing of individual DUB enzymes is used to reduce their ubiquitin deconjugating activity, leading to an increase of the ubiquitination of their substrates, which can then be isolated and identified. We report here quantitative proteomic data of the putative substrates of 5 human DUBs. Furthermore, we have built a novel interactive database of DUB substrates to provide easy access to our data and collect DUB proteome data from other groups as a reference resource in the DUB substrates research field.
Asunto(s)
Enzimas Desubicuitinizantes/genética , Proteoma/genética , Proteómica , Especificidad por Sustrato/genética , Bases de Datos Genéticas , Enzimas Desubicuitinizantes/aislamiento & purificación , Humanos , Ubiquitina/genética , Ubiquitinación/genéticaRESUMEN
Shotgun proteomics is the method of choice for high-throughput protein identification; however, robust statistical methods are essential to automatize this task while minimizing the number of false identifications. The standard method for estimating the false discovery rate (FDR) of individual identifications and keeping it below a threshold (typically 1%) is the target-decoy approach. However, numerous works have shown that FDR at the protein level may become much larger than FDR at the peptide level. The development of an appropriate scoring model to identify proteins from their peptides using high-throughput shotgun proteomics is highly needed. In this study, we present a novel protein-level scoring algorithm that uses the scores of the identified peptides and maintains all of the properties expected for a true protein probability. We also present a refinement of the picked method to calculate FDR at the protein level. These algorithms can be used together as a robust identification workflow suitable for large-scale proteomics, and we show that the identification performance of this workflow is superior to that of other widely used methods in several samples and using different search engines. Our protein probability model offers the scientific community an algorithm that is easy to integrate into protein identification workflows for the automated analysis of shotgun proteomics data.
Asunto(s)
Proteómica , Espectrometría de Masas en Tándem , Algoritmos , Bases de Datos de Proteínas , Probabilidad , ProteínasRESUMEN
The nuclear export receptor CRM1 (XPO1) recognizes and binds specific sequence motifs termed nuclear export signals (NESs) in cargo proteins. About 200 NES motifs have been identified, but over a thousand human proteins are potential CRM1 cargos, and most of their NESs remain to be identified. On the other hand, the interaction of NES peptides with the "NES-binding groove" of CRM1 was studied in detail using structural and biochemical analyses, but a better understanding of CRM1 function requires further investigation of how the results from these in vitro studies translate into actual NES export in a cellular context. Here we show that a simple cellular assay, based on a recently described reporter (SRVB/A), can be applied to identify novel potential NESs motifs, and to obtain relevant information on different aspects of CRM1-mediated NES export. Using cellular assays, we first map 19 new sequence motifs with nuclear export activity in 14 cancer-related proteins that are potential CRM1 cargos. Next, we investigate the effect of mutations in individual NES-binding groove residues, providing further insight into CRM1-mediated NES export. Finally, we extend the search for CRM1-dependent NESs to a recently uncovered, but potentially vast, set of small proteins called micropeptides. By doing so, we report the first NES-harboring human micropeptides.
Asunto(s)
Genes Reporteros , Carioferinas/metabolismo , Mutación , Proteínas de Neoplasias/metabolismo , Señales de Exportación Nuclear , Fragmentos de Péptidos/análisis , Receptores Citoplasmáticos y Nucleares/metabolismo , Transporte Activo de Núcleo Celular , Secuencias de Aminoácidos , Células HeLa , Humanos , Carioferinas/genética , Proteínas de Neoplasias/genética , Neoplasias , Receptores Citoplasmáticos y Nucleares/genética , Proteína Exportina 1RESUMEN
Mass-spectrometry-based proteomics has evolved into a high-throughput technology in which numerous large-scale data sets are generated from diverse analytical platforms. Furthermore, several scientific journals and funding agencies have emphasized the storage of proteomics data in public repositories to facilitate its evaluation, inspection, and reanalysis. (1) As a consequence, public proteomics data repositories are growing rapidly. However, tools are needed to integrate multiple proteomics data sets to compare different experimental features or to perform quality control analysis. Here, we present a new Java stand-alone tool, Proteomics Assay COMparator (PACOM), that is able to import, combine, and simultaneously compare numerous proteomics experiments to check the integrity of the proteomic data as well as verify data quality. With PACOM, the user can detect source of errors that may have been introduced in any step of a proteomics workflow and that influence the final results. Data sets can be easily compared and integrated, and data quality and reproducibility can be visually assessed through a rich set of graphical representations of proteomics data features as well as a wide variety of data filters. Its flexibility and easy-to-use interface make PACOM a unique tool for daily use in a proteomics laboratory. PACOM is available at https://github.com/smdb21/pacom .
Asunto(s)
Conjuntos de Datos como Asunto , Espectrometría de Masas , Proteómica/métodos , Programas Informáticos , Exactitud de los Datos , Bases de Datos de Proteínas , Internet , Reproducibilidad de los Resultados , Flujo de TrabajoRESUMEN
The Human Proteome Project (HPP) aims deciphering the complete map of the human proteome. In the past few years, significant efforts of the HPP teams have been dedicated to the experimental detection of the missing proteins, which lack reliable mass spectrometry evidence of their existence. In this endeavor, an in depth analysis of shotgun experiments might represent a valuable resource to select a biological matrix in design validation experiments. In this work, we used all the proteomic experiments from the NCI60 cell lines and applied an integrative approach based on the results obtained from Comet, Mascot, OMSSA, and X!Tandem. This workflow benefits from the complementarity of these search engines to increase the proteome coverage. Five missing proteins C-HPP guidelines compliant were identified, although further validation is needed. Moreover, 165 missing proteins were detected with only one unique peptide, and their functional analysis supported their participation in cellular pathways as was also proposed in other studies. Finally, we performed a combined analysis of the gene expression levels and the proteomic identifications from the common cell lines between the NCI60 and the CCLE project to suggest alternatives for further validation of missing protein observations.
Asunto(s)
Proteoma/análisis , Proteómica/métodos , Motor de Búsqueda , Línea Celular Tumoral , Humanos , Bases del Conocimiento , Proteínas/análisis , Programas InformáticosRESUMEN
dasHPPboard is a novel proteomics-based dashboard that collects and reports the experiments produced by the Spanish Human Proteome Project consortium (SpHPP) and aims to help HPP to map the entire human proteome. We have followed the strategy of analog genomics projects like the Encyclopedia of DNA Elements (ENCODE), which provides a vast amount of data on human cell lines experiments. The dashboard includes results of shotgun and selected reaction monitoring proteomics experiments, post-translational modifications information, as well as proteogenomics studies. We have also processed the transcriptomics data from the ENCODE and Human Body Map (HBM) projects for the identification of specific gene expression patterns in different cell lines and tissues, taking special interest in those genes having little proteomic evidence available (missing proteins). Peptide databases have been built using single nucleotide variants and novel junctions derived from RNA-Seq data that can be used in search engines for sample-specific protein identifications on the same cell lines or tissues. The dasHPPboard has been designed as a tool that can be used to share and visualize a combination of proteomic and transcriptomic data, providing at the same time easy access to resources for proteogenomics analyses. The dasHPPboard can be freely accessed at: http://sphppdashboard.cnb.csic.es.
Asunto(s)
Genómica , Proteoma , Humanos , Procesamiento Proteico-Postraduccional , TranscriptomaRESUMEN
MOTIVATION: Leucine-rich nuclear export signals (NESs) are short amino acid motifs that mediate binding of cargo proteins to the nuclear export receptor CRM1, and thus contribute to regulate the localization and function of many cellular proteins. Computational prediction of NES motifs is of great interest, but remains a significant challenge. RESULTS: We have developed a novel approach for amino acid motif searching that can be used for NES prediction. This approach, termed Wregex (weighted regular expression), combines regular expressions with a position-specific scoring matrix (PSSM), and has been implemented in a web-based, freely available, software tool. By making use of a PSSM, Wregex provides a score to prioritize candidates for experimental testing. Key features of Wregex include its flexibility, which makes it useful for searching other types of protein motifs, and its fast execution time, which makes it suitable for large-scale analysis. In comparative tests with previously available prediction tools, Wregex is shown to offer a good rate of true-positive motifs, while keeping a smaller number of potential candidates.
Asunto(s)
Señales de Exportación Nuclear , Proteínas/química , Algoritmos , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Posición Específica de Matrices de Puntuación , Proteínas/metabolismo , Programas InformáticosRESUMEN
Inferring which protein species have been detected in bottom-up proteomics experiments has been a challenging problem for which solutions have been maturing over the past decade. While many inference approaches now function well in isolation, comparing and reconciling the results generated across different tools remains difficult. It presently stands as one of the greatest barriers in collaborative efforts such as the Human Proteome Project and public repositories such as the PRoteomics IDEntifications (PRIDE) database. Here we present a framework for reporting protein identifications that seeks to improve capabilities for comparing results generated by different inference tools. This framework standardizes the terminology for describing protein identification results, associated with the HUPO-Proteomics Standards Initiative (PSI) mzIdentML standard, while still allowing for differing methodologies to reach that final state. It is proposed that developers of software for reporting identification results will adopt this terminology in their outputs. While the new terminology does not require any changes to the core mzIdentML model, it represents a significant change in practice, and, as such, the rules will be released via a new version of the mzIdentML specification (version 1.2) so that consumers of files are able to determine whether the new guidelines have been adopted by export software.
Asunto(s)
Espectrometría de Masas/normas , Proteínas/análisis , Proteómica/normas , Programas Informáticos/normas , Bases de Datos de Proteínas , Humanos , Espectrometría de Masas/métodos , Proteómica/métodosRESUMEN
The Spanish team of the Human Proteome Project (SpHPP) marked the annotation of Chr16 and data analysis as one of its priorities. Precise annotation of Chromosome 16 proteins according to C-HPP criteria is presented. Moreover, Human Body Map 2.0 RNA-Seq and Encyclopedia of DNA Elements (ENCODE) data sets were used to obtain further information relative to cell/tissue specific chromosome 16 coding gene expression patterns and to infer the presence of missing proteins. Twenty-four shotgun 2D-LC-MS/MS and gel/LC-MS/MS MIAPE compliant experiments, representing 41% coverage of chromosome 16 proteins, were performed. Furthermore, mapping of large-scale multicenter mass spectrometry data sets from CCD18, MCF7, Jurkat, and Ramos cell lines into RNA-Seq data allowed further insights relative to correlation of chromosome 16 transcripts and proteins. Detection and quantification of chromosome 16 proteins in biological matrices by SRM procedures are also primary goals of the SpHPP. Two strategies were undertaken: one focused on known proteins, taking advantage of MS data already available, and the second, aimed at the detection of the missing proteins, is based on the expression of recombinant proteins to gather MS information and optimize SRM methods that will be used in real biological samples. SRM methods for 49 known proteins and for recombinant forms of 24 missing proteins are reported in this study.
Asunto(s)
Cromosomas Humanos Par 16 , Proteoma , Transcriptoma , Cromatografía Liquida , Humanos , Espectrometría de Masas , Análisis de Secuencia de ARNRESUMEN
Short linear motifs (SLiMs) play an important role in protein-protein interactions. However, SLiM patterns are intrinsically permissive and result into many matches that occur just by chance, specially when targeting large datasets. To prioritize these matches as candidates for functional testing, we developed Wregex (Weighted regular expression), which uses a position-specific scoring matrix (PSSM) to order a list of regular expression matches according to a PSSM-derived score. Here we present Wregex 3.0, an improved version with new functionalities such as the support for a second auxiliary motif to help refining prediction of a primary SLiM, and post-translational modifications (PTMs) enrichment taking into account that many regulatory SLiM-mediated interactions are modulated by one or more PTMs. This version also incorporates a number of new features such as a convenient use of subproteomes, showing UniProt annotations such as disordered regions, searching for all known motifs and generating decoy databases for enrichment analysis. We provide case studies to illustrate how these new Wregex functionalities enhance prediction of short linear protein motifs. The Wregex 3.0 server is freely accessible at https://ehubio.ehu.eus/wregex3/.
RESUMEN
BACKGROUND: Protein inference from peptide identifications in shotgun proteomics must deal with ambiguities that arise due to the presence of peptides shared between different proteins, which is common in higher eukaryotes. Recently data independent acquisition (DIA) approaches have emerged as an alternative to the traditional data dependent acquisition (DDA) in shotgun proteomics experiments. MSE is the term used to name one of the DIA approaches used in QTOF instruments. MSE data require specialized software to process acquired spectra and to perform peptide and protein identifications. However the software available at the moment does not group the identified proteins in a transparent way by taking into account peptide evidence categories. Furthermore the inspection, comparison and report of the obtained results require tedious manual intervention. Here we report a software tool to address these limitations for MSE data. RESULTS: In this paper we present PAnalyzer, a software tool focused on the protein inference process of shotgun proteomics. Our approach considers all the identified proteins and groups them when necessary indicating their confidence using different evidence categories. PAnalyzer can read protein identification files in the XML output format of the ProteinLynx Global Server (PLGS) software provided by Waters Corporation for their MSE data, and also in the mzIdentML format recently standardized by HUPO-PSI. Multiple files can also be read simultaneously and are considered as technical replicates. Results are saved to CSV, HTML and mzIdentML (in the case of a single mzIdentML input file) files. An MSE analysis of a real sample is presented to compare the results of PAnalyzer and ProteinLynx Global Server. CONCLUSIONS: We present a software tool to deal with the ambiguities that arise in the protein inference process. Key contributions are support for MSE data analysis by ProteinLynx Global Server and technical replicates integration. PAnalyzer is an easy to use multiplatform and free software tool.
Asunto(s)
Proteínas/análisis , Proteómica/métodos , Programas Informáticos , Bases de Datos de Proteínas , Células HEK293 , Humanos , Péptidos/análisisRESUMEN
Shotgun proteomics is the method of choice for large-scale protein identification. However, the use of a robust statistical workflow to validate such identification is mandatory to minimize false matches, ambiguities, and amplification of error rates from spectra to proteins. In this chapter we emphasize the key concepts to take into account when processing the output of a search engine to obtain reliable peptide or protein identifications. We assume that the reader is already familiar with tandem mass spectrometry so we can focus on the use of statistical confidence methods. After introducing the key concepts we present different software tools and how to use them with an example dataset.
Asunto(s)
Biología Computacional , Péptidos/análisis , Proteínas/análisis , Proteómica/métodos , Motor de Búsqueda , Programas Informáticos , Bases de Datos de Proteínas , Espectrometría de Masas en TándemRESUMEN
Mass spectrometry is extremely efficient for sequencing small peptides generated by, for example, a trypsin digestion of a complex mixture. Current instruments have the capacity to generate 50-100 K MSMS spectra from a single run. Of these ~30-50% is typically assigned to peptide matches on a 1% FDR threshold. The remaining spectra need more research to explain. We address here whether the 30-50% matched spectra provide consensus matches when using different database-dependent search pipelines. Although the majority of the spectra peptide assignments concur across search engines, our conclusion is that database-dependent search engines still require improvements.
Asunto(s)
Bases de Datos de Proteínas , Espectrometría de Masas/métodos , Péptidos/análisis , Motor de Búsqueda , Fragmentos de Péptidos/análisis , Espectrometría de Masas en TándemRESUMEN
Altered expression of XPO1, the main nuclear export receptor in eukaryotic cells, has been observed in cancer, and XPO1 has been a focus of anticancer drug development. However, mechanistic evidence for cancer-specific alterations in XPO1 function is lacking. Here, genomic analysis of 42,793 cancers identified recurrent and previously unrecognized mutational hotspots in XPO1. XPO1 mutations exhibited striking lineage specificity, with enrichment in a variety of B-cell malignancies, and introduction of single amino acid substitutions in XPO1 initiated clonal, B-cell malignancy in vivo. Proteomic characterization identified that mutant XPO1 altered the nucleocytoplasmic distribution of hundreds of proteins in a sequence-specific manner that promoted oncogenesis. XPO1 mutations preferentially sensitized cells to inhibitors of nuclear export, providing a biomarker of response to this family of drugs. These data reveal a new class of oncogenic alteration based on change-of-function mutations in nuclear export signal recognition and identify therapeutic targets based on altered nucleocytoplasmic trafficking. SIGNIFICANCE: Here, we identify that heterozygous mutations in the main nuclear exporter in eukaryotic cells, XPO1, are positively selected in cancer and promote the initiation of clonal B-cell malignancies. XPO1 mutations alter nuclear export signal recognition in a sequence-specific manner and sensitize cells to compounds in clinical development inhibiting XPO1 function.This article is highlighted in the In This Issue feature, p. 1325.
Asunto(s)
Transformación Celular Neoplásica , Señales de Exportación Nuclear , Transporte Activo de Núcleo Celular , Animales , Proliferación Celular , Modelos Animales de Enfermedad , Expresión Génica , Genes bcl-2 , Genes myc , Humanos , Carioferinas/química , Carioferinas/genética , Carioferinas/metabolismo , Leucemia de Células B/genética , Leucemia de Células B/metabolismo , Leucemia de Células B/mortalidad , Leucemia de Células B/patología , Ratones , Mutación , Especificidad de Órganos/genética , Unión Proteica , Receptores Citoplasmáticos y Nucleares/química , Receptores Citoplasmáticos y Nucleares/genética , Receptores Citoplasmáticos y Nucleares/metabolismo , Relación Estructura-Actividad , Proteína Exportina 1RESUMEN
Large-scale sequencing projects are uncovering a growing number of missense mutations in human tumors. Understanding the phenotypic consequences of these alterations represents a formidable challenge. In silico prediction of functionally relevant amino acid motifs disrupted by cancer mutations could provide insight into the potential impact of a mutation, and guide functional tests. We have previously described Wregex, a tool for the identification of potential functional motifs, such as nuclear export signals (NESs), in proteins. Here, we present an improved version that allows motif prediction to be combined with data from large repositories, such as the Catalogue of Somatic Mutations in Cancer (COSMIC), and to be applied to a whole proteome scale. As an example, we have searched the human proteome for candidate NES motifs that could be altered by cancer-related mutations included in the COSMIC database. A subset of the candidate NESs identified was experimentally tested using an in vivo nuclear export assay. A significant proportion of the selected motifs exhibited nuclear export activity, which was abrogated by the COSMIC mutations. In addition, our search identified a cancer mutation that inactivates the NES of the human deubiquitinase USP21, and leads to the aberrant accumulation of this protein in the nucleus.
Asunto(s)
Mutación Missense , Neoplasias/metabolismo , Ubiquitina Tiolesterasa/química , Ubiquitina Tiolesterasa/genética , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Biología Computacional/métodos , Simulación por Computador , Humanos , Neoplasias/genética , Señales de Exportación Nuclear , Proteoma/química , Proteoma/genética , Programas InformáticosRESUMEN
Currently the bottom up approach is the most popular for characterizing protein samples by mass spectrometry. This is mainly attributed to the fact that the bottom up approach has been successfully optimized for high throughput studies. However, the bottom up approach is associated with a number of challenges such as loss of linkage information between peptides. Previous publications have addressed some of these problems which are commonly referred to as protein inference. Nevertheless, all previous publications on the subject are oversimplified and do not represent the full complexity of the proteins identified. To this end we present here SIR (spectra based isoform resolver) that uses a novel transparent and systematic approach for organizing and presenting identified proteins based on peptide spectra assignments. The algorithm groups peptides and proteins into five evidence groups and calculates sixteen parameters for each identified protein that are useful for cases where deterministic protein inference is the goal. The novel approach has been incorporated into SIR which is a user-friendly tool only concerned with protein inference based on imports of Mascot search results. SIR has in addition two visualization tools that facilitate further exploration of the protein inference problem.