RESUMO
In mass-spectrometry-based proteomics, the identification and quantification of peptides and proteins heavily rely on sequence database searching or spectral library matching. The lack of accurate predictive models for fragment ion intensities impairs the realization of the full potential of these approaches. Here, we extended the ProteomeTools synthetic peptide library to 550,000 tryptic peptides and 21 million high-quality tandem mass spectra. We trained a deep neural network, termed Prosit, resulting in chromatographic retention time and fragment ion intensity predictions that exceed the quality of the experimental data. Integrating Prosit into database search pipelines led to more identifications at >10× lower false discovery rates. We show the general applicability of Prosit by predicting spectra for proteases other than trypsin, generating spectral libraries for data-independent acquisition and improving the analysis of metaproteomes. Prosit is integrated into ProteomicsDB, allowing search result re-scoring and custom spectral library generation for any organism on the basis of peptide sequence alone.
Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Fragmentos de Peptídeos/análise , Biblioteca de Peptídeos , Proteoma/análise , Software , Espectrometria de Massas em Tandem/métodos , Animais , Caenorhabditis elegans/metabolismo , Bases de Dados de Proteínas , Drosophila melanogaster/metabolismo , Células HEK293 , Humanos , Fragmentos de Peptídeos/metabolismo , Proteoma/metabolismo , Saccharomyces cerevisiae/metabolismoRESUMO
Genome-, transcriptome- and proteome-wide measurements provide insights into how biological systems are regulated. However, fundamental aspects relating to which human proteins exist, where they are expressed and in which quantities are not fully understood. Therefore, we generated a quantitative proteome and transcriptome abundance atlas of 29 paired healthy human tissues from the Human Protein Atlas project representing human genes by 18,072 transcripts and 13,640 proteins including 37 without prior protein-level evidence. The analysis revealed that hundreds of proteins, particularly in testis, could not be detected even for highly expressed mRNAs, that few proteins show tissue-specific expression, that strong differences between mRNA and protein quantities within and across tissues exist and that protein expression is often more stable across tissues than that of transcripts. Only 238 of 9,848 amino acid variants found by exome sequencing could be confidently detected at the protein level showing that proteogenomics remains challenging, needs better computational methods and requires rigorous validation. Many uses of this resource can be envisaged including the study of gene/protein expression regulation and biomarker specificity evaluation.
Assuntos
Genoma Humano/genética , Proteoma/genética , Distribuição Tecidual/genética , Transcriptoma/genética , Regulação da Expressão Gênica/genética , Humanos , Espectrometria de Massas/métodos , Proteômica/métodos , RNA Mensageiro/genética , Análise de Sequência de RNA/métodosRESUMO
The analysis of the post-translational modification (PTM) state of proteins using mass spectrometry-based bottom-up proteomic workflows has evolved into a powerful tool for the study of cellular regulatory events that are not directly encoded at the genome level. Besides frequently detected modifications such as phosphorylation, acetylation and ubiquitination, many low abundant or less frequently detected PTMs are known or postulated to serve important regulatory functions. To more broadly understand the LC-MS/MS characteristics of PTMs, we synthesized and analyzed â¼5,000 peptides representing 21 different naturally occurring modifications of lysine, arginine, proline and tyrosine side chains and their unmodified counterparts. The analysis identified changes in retention times, shifts of precursor charge states and differences in search engine scores between modifications. PTM-dependent changes in the fragmentation behavior were evaluated using eleven different fragmentation modes or collision energies. We also systematically investigated the formation of diagnostic ions or neutral losses for all PTMs, confirming 10 known and identifying 5 novel diagnostic ions for lysine modifications. To demonstrate the value of including diagnostic ions in database searching, we reprocessed a public data set of lysine crotonylation and showed that considering the diagnostic ions increases confidence in the identification of the modified peptides. To our knowledge, this constitutes the first broad and systematic analysis of the LC-MS/MS properties of common and rare PTMs using synthetic peptides, leading to direct applicable utility for bottom-up proteomic experiments.
Assuntos
Peptídeos/metabolismo , Processamento de Proteína Pós-Traducional , Proteoma/metabolismo , Espectrometria de Massas em Tandem/métodos , Cromatografia Líquida , Cromatografia de Fase Reversa , Bases de Dados de Proteínas , ÍonsRESUMO
The coordination of protein synthesis and degradation regulating protein abundance is a fundamental process in cellular homeostasis. Today, mass spectrometry-based technologies allow determination of endogenous protein turnover on a proteome-wide scale. However, standard dynamic SILAC (Stable Isotope Labeling in Cell Culture) approaches can suffer from missing data across pulse time-points limiting the accuracy of such analysis. This issue is of particular relevance when studying protein stability at the level of proteoforms because often only single peptides distinguish between different protein products of the same gene. To address this shortcoming, we evaluated the merits of combining dynamic SILAC and tandem mass tag (TMT)-labeling of ten pulse time-points in a single experiment. Although the comparison to the standard dynamic SILAC method showed a high concordance of protein turnover rates, the pulsed SILAC-TMT approach yielded more comprehensive data (6000 proteins on average) without missing values. Replicate analysis further established that the same reproducibility of turnover rate determination can be obtained for peptides and proteins facilitating proteoform resolved investigation of protein stability. We provide several examples of differentially turned over splice variants and show that post-translational modifications can affect cellular protein half-lives. For example, N-terminally processed peptides exhibited both faster and slower turnover behavior compared with other peptides of the same protein. In addition, the suspected proteolytic processing of the fusion protein FAU was substantiated by measuring vastly different stabilities of the cleavage products. Furthermore, differential peptide turnover suggested a previously unknown mechanism of activity regulation by post-translational destabilization of cathepsin D as well as the DNA helicase BLM. Finally, our comprehensive data set facilitated a detailed evaluation of the impact of protein properties and functions on protein stability in steady-state cells and uncovered that the high turnover of respiratory chain complex I proteins might be explained by oxidative stress.
Assuntos
Peptídeos/metabolismo , Proteoma/metabolismo , Proteômica/métodos , Estabilidade Enzimática , Meia-Vida , Células HeLa , Humanos , Marcação por Isótopo , NADH Desidrogenase/metabolismo , Estresse Oxidativo/efeitos dos fármacos , Biossíntese de Proteínas , Proteólise , Reprodutibilidade dos TestesRESUMO
Kinase inhibitors are important cancer therapeutics. Polypharmacology is commonly observed, requiring thorough target deconvolution to understand drug mechanism of action. Using chemical proteomics, we analyzed the target spectrum of 243 clinically evaluated kinase drugs. The data revealed previously unknown targets for established drugs, offered a perspective on the "druggable" kinome, highlighted (non)kinase off-targets, and suggested potential therapeutic applications. Integration of phosphoproteomic data refined drug-affected pathways, identified response markers, and strengthened rationale for combination treatments. We exemplify translational value by discovering SIK2 (salt-inducible kinase 2) inhibitors that modulate cytokine production in primary cells, by identifying drugs against the lung cancer survival marker MELK (maternal embryonic leucine zipper kinase), and by repurposing cabozantinib to treat FLT3-ITD-positive acute myeloid leukemia. This resource, available via the ProteomicsDB database, should facilitate basic, clinical, and drug discovery research and aid clinical decision-making.
Assuntos
Antineoplásicos/farmacologia , Descoberta de Drogas/métodos , Terapia de Alvo Molecular , Inibidores de Proteínas Quinases/farmacologia , Proteômica/métodos , Animais , Antineoplásicos/química , Linhagem Celular Tumoral , Citocinas/metabolismo , Humanos , Leucemia Mieloide Aguda/tratamento farmacológico , Leucemia Mieloide Aguda/enzimologia , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/enzimologia , Camundongos , Inibidores de Proteínas Quinases/química , Proteínas Serina-Treonina Quinases/antagonistas & inibidores , Ensaios Antitumorais Modelo de Xenoenxerto , Tirosina Quinase 3 Semelhante a fms/antagonistas & inibidoresRESUMO
Beyond specific applications, such as the relative or absolute quantification of peptides in targeted proteomic experiments, synthetic spike-in peptides are not yet systematically used as internal standards in bottom-up proteomics. A number of retention time standards have been reported that enable chromatographic aligning of multiple LC-MS/MS experiments. However, only few peptides are typically included in such sets limiting the analytical parameters that can be monitored. Here, we describe PROCAL (ProteomeTools Calibration Standard), a set of 40 synthetic peptides that span the entire hydrophobicity range of tryptic digests, enabling not only accurate determination of retention time indices but also monitoring of chromatographic separation performance over time. The fragmentation characteristics of the peptides can also be used to calibrate and compare collision energies between mass spectrometers. The sequences of all selected peptides do not occur in any natural protein, thus eliminating the need for stable isotope labeling. We anticipate that this set of peptides will be useful for multiple purposes in individual laboratories but also aiding the transfer of data acquisition and analysis methods between laboratories, notably the use of spectral libraries.
Assuntos
Cromatografia Líquida/normas , Fragmentos de Peptídeos/análise , Proteínas/análise , Proteômica/normas , Espectrometria de Massas em Tandem/normas , Calibragem , Cromatografia Líquida/métodos , Células HeLa , Humanos , Proteômica/métodos , Padrões de Referência , Espectrometria de Massas em Tandem/métodosRESUMO
Offline two-dimensional chromatography is a common means to achieve deep proteome coverage. To reduce sample complexity and dynamic range and to utilize mass spectrometer (MS) time efficiently, high chromatographic resolution of and good orthogonality between the two dimensions are needed. Ion exchange and high pH reversed phase chromatography are often used for this purpose. However, the former requires desalting to be MS-compatible, and the latter requires fraction pooling to create orthogonality. Here, we report an alternative first-dimension separation technique using a commercial trimodal phase incorporating polar embedded reversed phase, weak anion exchange, and strong cation exchange material. The column is capable of retaining polar and nonpolar peptides alike without noticeable breakthrough. It allows separating ordinary and TMT-labeled peptides under mild acidic conditions using an acetonitrile gradient. The direct MS compatibility of solvents and good orthogonality to online coupled C18 columns enable a straightforward workflow without fraction pooling and desalting while showing comparable performance to the other techniques. The method scales from low to high microgram sample quantity and is amenable to full automation. To demonstrate practical utility, we analyzed the proteomes of 10 human pancreatic cancer cell lines to a depth of >8,700 quantified proteins.