RESUMO
Estimating the false discovery rate (FDR) of peptide identifications is a key step in proteomics data analysis, and many methods have been proposed for this purpose. Recently, an entrapment-inspired protocol to validate methods for FDR estimation appeared in articles showcasing new spectral library search tools. That validation approach involves generating incorrect spectral matches by searching spectra from evolutionarily distant organisms (entrapment queries) against the original target search space. Although this approach may appear similar to the solutions using entrapment databases, it represents a distinct conceptual framework whose correctness has not been verified yet. In this viewpoint, we first discussed the background of the entrapment-based validation protocols and then conducted a few simple computational experiments to verify the assumptions behind them. The results reveal that entrapment databases may, in some implementations, be a reasonable choice for validation, while the assumptions underpinning validation protocols based on entrapment queries are likely to be violated in practice. This article also highlights the need for well-designed frameworks for validating FDR estimation methods in proteomics.
Assuntos
Bases de Dados de Proteínas , Proteômica , Espectrometria de Massas em Tandem , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Peptídeos/análise , Animais , Humanos , Reprodutibilidade dos TestesRESUMO
One of the chief objectives in mass spectrometry-based peptide identification in proteomics is the statistical validation of top-scoring peptide-spectrum matches (PSMs) in the form of false discovery rate (FDR) estimation. Existing methods construct a null model that captures the characteristics of incorrect target PSMs to estimate the FDR, most often with the help of decoys. Decoy-based methods, however, increase the computational cost and rely on the difficult-to-verify assumption that decoy PSMs constitute a sufficient and representative sample of the population of possible incorrect target PSMs. On the other hand, the possibility of FDR estimation assisted by the plentiful non-top-scoring PSMs, which are almost always incorrect, has been scarcely explored. In this work, we propose a novel decoy-free procedure for developing null models for top-scoring PSMs using the transformed e-value (TEV) score and the distributions of non-top-scoring target PSMs. The method relies on a theoretically derivable relationship between the parameters of the distributions of lower-order statistics of the TEV score and a necessary empirical optimization to fit a single parameter to actual data. The framework was tested on multiple different data sets and two search engines. We present evidence that our method is comparable to and occasionally outperforms popular decoy-free and decoy-based methods in FDR estimation.
Assuntos
Proteômica , Espectrometria de Massas em Tandem , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Peptídeos , Ferramenta de Busca , Bases de Dados de Proteínas , AlgoritmosRESUMO
In shotgun proteomics, false discovery rate (FDR) estimation is a necessary step to ensure the quality of accepted peptide-spectrum matches (PSMs) from a database search. Popular statistical validation tools for FDR control tend to rely on target-decoy searching to build empirical, dataset-specific models, which often leads to inaccurate FDR estimates. In this paper, we propose a new approach named common decoy distribution (CDD) to FDR estimation using the idea of a fixed empirical null score distribution derived from millions of peptide tandem mass spectra. To demonstrate the viability of CDD, its stability with respect to noise and the presence of unexpected peptide modifications was evaluated. PeptideProphet-based implementation of CDD was benchmarked against decoy-based PeptideProphet, and both methods exhibited similar accuracy of FDR estimates and retrieval of correct PSMs. The finding of this study calls for a re-evaluation of the necessity of dataset-specific target-decoy searches and illustrates the potential of Big Data approaches for statistical analysis in proteomics.
Assuntos
Algoritmos , Proteômica , Bases de Dados de Proteínas , Peptídeos , Proteômica/métodos , Software , Espectrometria de Massas em TandemRESUMO
The formation of covalently bound DNA-protein crosslinks (DPCs) is linked to the pathophysiology of cancers and many other degenerative diseases. Knowledge of the proteins that were frequently involved in forming DPCs will improve our understanding of the etiological mechanism of diseases and facilitate the establishment of preventive measures and treatment methods. By using SDS-PAGE and nano-LC coupled Orbitrap LC-MS/MS analyses, we identified, for the first time, that the major DNA-cross-linked proteins in HeLa cells exposed to a methylating agent (methylmethanesulfonate) or hydroxyl free radicals are transcription-associated proteins. In particular, histone H2B3B and poly(rC) binding protein 2 were identified as the most frequent DPC-forming proteins.
Assuntos
Proteínas de Ligação a DNA/antagonistas & inibidores , DNA/efeitos dos fármacos , Ácido Edético/farmacologia , Compostos Ferrosos/farmacologia , Metanossulfonato de Metila/farmacologia , Proteômica , Cromatografia Líquida , Eletroforese em Gel de Poliacrilamida , Células HeLa , Humanos , Radical Hidroxila/farmacologia , Estrutura Molecular , Espectrometria de Massas em TandemRESUMO
Recently it was discovered that thioproline, an unnatural analog of proline, can arise in vivo from the reaction of cysteine and formaldehyde in cells under oxidative stress. Sequence-specific bioincorporation of thioproline into proteins was studied via shotgun proteomics of Escherichia coli (E. coli) cells. In a strain auxotrophic for proline, thioproline was found widely incorporated in lieu of proline when the cells were incubated with thioproline. In total 1428 proteins and 235 distinct thioproline-containing peptides were identified. Label-free relative quantitation revealed 102 differentially expressed proteins (82 up-regulated and 20 down-regulated) in the thioproline-treated group (with thioproline in the medium) relative to the control group (with proline in the culture medium). Pathway enrichment analysis of the differentially expressed proteins showed that amino acid biosynthesis and protein synthesis has been most affected by thioproline exposure, as expected. Phenotypically, the thioproline-treated group was found to exhibit slower cell growth and stronger antioxidant capacity relative to the control. SIGNIFICANCE: Thioproline is a secondary metabolite of formaldehyde and a structural analog of proline. It is also known to exhibit a wide variety of pharmaceutical properties, but its exact biochemical role in the cell has not been elucidated. In this paper, we studied thioproline misincorporation (in lieu of proline) events during protein synthesis in E. coli. Global proteome profiling revealed that thioproline is extensively misincorporated throughout the proteome in E. coli cells exposed to thioproline, and pathways related to amino acid and protein biosynthesis are up-regulated. In addition, we demonstrated that pretreatment with thioproline appeared to increase E. coli cells' capacity to tolerate oxidative stress. Our findings suggest a novel explanation of thioproline's known antioxidative properties. This is, to our knowledge, the first ever study of thioproline misincorporation at the proteome level in any organism.