Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
J Proteome Res ; 23(6): 1907-1914, 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38687997

RESUMO

Traditional database search methods for the analysis of bottom-up proteomics tandem mass spectrometry (MS/MS) data are limited in their ability to detect peptides with post-translational modifications (PTMs). Recently, "open modification" database search strategies, in which the requirement that the mass of the database peptide closely matches the observed precursor mass is relaxed, have become popular as ways to find a wider variety of types of PTMs. Indeed, in one study, Kong et al. reported that the open modification search tool MSFragger can achieve higher statistical power to detect peptides than a traditional "narrow window" database search. We investigated this claim empirically and, in the process, uncovered a potential general problem with false discovery rate (FDR) control in the machine learning postprocessors Percolator and PeptideProphet. This problem might have contributed to Kong et al.'s report that their empirical results suggest that false discovery (FDR) control in the narrow window setting might generally be compromised. Indeed, reanalyzing the same data while using a more standard form of target-decoy competition-based FDR control, we found that, after accounting for chimeric spectra as well as for the inherent difference in the number of candidates in open and narrow searches, the data does not provide sufficient evidence that FDR control in proteomics MS/MS database search is inherently problematic.


Assuntos
Bases de Dados de Proteínas , Processamento de Proteína Pós-Traducional , Proteômica , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Proteômica/métodos , Peptídeos/análise , Peptídeos/química , Aprendizado de Máquina , Humanos , Algoritmos , Software
2.
Mol Cell Proteomics ; 21(12): 100432, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36280141

RESUMO

Rescoring of mass spectrometry (MS) search results using spectral predictors can strongly increase peptide spectrum match (PSM) identification rates. This approach is particularly effective when aiming to search MS data against large databases, for example, when dealing with nonspecific cleavage in immunopeptidomics or inflation of the reference database for noncanonical peptide identification. Here, we present inSPIRE (in silico Spectral Predictor Informed REscoring), a flexible and performant open-source rescoring pipeline built on Prosit MS spectral prediction, which is compatible with common database search engines. inSPIRE allows large-scale rescoring with data from multiple MS search files, increases sensitivity to minor differences in amino acid residue position, and can be applied to various MS sample types, including tryptic proteome digestions and immunopeptidomes. inSPIRE boosts PSM identification rates in immunopeptidomics, leading to better performance than the original Prosit rescoring pipeline, as confirmed by benchmarking of inSPIRE performance on ground truth datasets. The integration of various features in the inSPIRE backbone further boosts the PSM identification in immunopeptidomics, with a potential benefit for the identification of noncanonical peptides.


Assuntos
Peptídeos , Proteômica , Proteômica/métodos , Bases de Dados de Proteínas , Peptídeos/química , Ferramenta de Busca , Espectrometria de Massas , Algoritmos , Software
3.
J Proteome Res ; 21(5): 1365-1370, 2022 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-35446579

RESUMO

Maintaining high sensitivity while limiting false positives is a key challenge in peptide identification from mass spectrometry data. Here, we investigate the effects of integrating the machine learning-based postprocessor Percolator into our spectral library searching tool COSS (CompOmics Spectral library Searching tool). To evaluate the effects of this postprocessing, we have used 40 data sets from 2 different projects and have searched these against the NIST and MassIVE spectral libraries. The searching is carried out using 2 spectral library search tools, COSS and MSPepSearch with and without Percolator postprocessing, and using sequence database search engine MS-GF+ as a baseline comparator. The addition of the Percolator rescoring step to COSS is effective and results in a substantial improvement in sensitivity and specificity of the identifications. COSS is freely available as open source under the permissive Apache2 license, and binaries and source code are found at https://github.com/compomics/COSS.


Assuntos
Proteômica , Ferramenta de Busca , Algoritmos , Bases de Dados de Proteínas , Biblioteca de Peptídeos , Proteômica/métodos , Ferramenta de Busca/métodos , Software , Espectrometria de Massas em Tandem/métodos
4.
J Proteome Res ; 20(4): 1966-1971, 2021 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-33596079

RESUMO

Proteomics studies rely on the accurate assignment of peptides to the acquired tandem mass spectra-a task where machine learning algorithms have proven invaluable. We describe mokapot, which provides a flexible semisupervised learning algorithm that allows for highly customized analyses. We demonstrate some of the unique features of mokapot by improving the detection of RNA-cross-linked peptides from an analysis of RNA-binding proteins and increasing the consistency of peptide detection in a single-cell proteomics study.


Assuntos
Peptídeos , Proteômica , Algoritmos , Bases de Dados de Proteínas , Espectrometria de Massas em Tandem
5.
J Proteome Res ; 20(4): 1849-1854, 2021 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-33529032

RESUMO

Nonparametric statistical tests are an integral part of scientific experiments in a diverse range of fields. When performing such tests, it is standard to sort values; however, this requires Ω(n log(n)) time to sort n values. Thus given enough data, sorting becomes the computational bottleneck, even with very optimized implementations such as the C++ standard library routine, std::sort. Frequently, a nonparametric statistical test is only used to partition values above and below a threshold in the sorted ordering, where the threshold corresponds to a significant statistical result. Linear-time selection and partitioning algorithms cannot be directly used because the selection and partitioning are performed on the transformed statistical significance values rather than on the sorted statistics. Usually, those transformed statistical significance values (e.g., the p value when investigating the family-wise error rate and q values when investigating the false discovery rate (FDR)) can only be computed at a threshold. Because this threshold is unknown, this leads to sorting the data. Layer-ordered heaps, which can be constructed in O(n), only partially sort values and thus can be used to get around the slow runtime required to fully sort. Here we introduce a layer-ordering-based method for selection and partitioning on the transformed values (e.g., p values or q values). We demonstrate the use of this method to partition peptides using an FDR threshold. This approach is applied to speed up Percolator, a postprocessing algorithm used in mass-spectrometry-based proteomics to evaluate the quality of peptide-spectrum matches (PSMs), by >70% on data sets with 100 million PSMs.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Algoritmos , Bases de Dados de Proteínas , Peptídeos , Software
6.
J Proteome Res ; 19(3): 1267-1274, 2020 03 06.
Artigo em Inglês | MEDLINE | ID: mdl-32009418

RESUMO

Machine learning methods have proven invaluable for increasing the sensitivity of peptide detection in proteomics experiments. Most modern tools, such as Percolator and PeptideProphet, use semisupervised algorithms to learn models directly from the data sets that they analyze. Although these methods are effective for many proteomics experiments, we suspected that they may be suboptimal for experiments of smaller scale. In this work, we found that the power and consistency of Percolator results were reduced as the size of the experiment was decreased. As an alternative, we propose a different operating mode for Percolator: learn a model with Percolator from a large data set and use the learned model to evaluate the small-scale experiment. We call this a "static modeling" approach, in contrast to Percolator's usual "dynamic model" that is trained anew for each data set. We applied this static modeling approach to two settings: small, gel-based experiments and single-cell proteomics. In both cases, static models increased the yield of detected peptides and eliminated the model-induced variability of the standard dynamic approach. These results suggest that static models are a powerful tool for bringing the full benefits of Percolator and other semisupervised algorithms to small-scale experiments.


Assuntos
Software , Espectrometria de Massas em Tandem , Algoritmos , Bases de Dados de Proteínas , Aprendizado de Máquina , Proteômica
7.
J Proteome Res ; 18(9): 3353-3359, 2019 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-31407580

RESUMO

The processing of peptide tandem mass spectrometry data involves matching observed spectra against a sequence database. The ranking and calibration of these peptide-spectrum matches can be improved substantially using a machine learning postprocessor. Here, we describe our efforts to speed up one widely used postprocessor, Percolator. The improved software is dramatically faster than the previous version of Percolator, even when using relatively few processors. We tested the new version of Percolator on a data set containing over 215 million spectra and recorded an overall reduction to 23% of the running time as compared to the unoptimized code. We also show that the memory footprint required by these speedups is modest relative to that of the original version of Percolator.


Assuntos
Peptídeos/genética , Proteômica/métodos , Software , Algoritmos , Bases de Dados de Proteínas , Aprendizado de Máquina , Peptídeos/classificação , Peptídeos/isolamento & purificação , Espectrometria de Massas em Tandem/métodos
8.
J Proteome Res ; 17(5): 1978-1982, 2018 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-29607643

RESUMO

Percolator is an important tool for greatly improving the results of a database search and subsequent downstream analysis. Using support vector machines (SVMs), Percolator recalibrates peptide-spectrum matches based on the learned decision boundary between targets and decoys. To improve analysis time for large-scale data sets, we update Percolator's SVM learning engine through software and algorithmic optimizations rather than heuristic approaches that necessitate the careful study of their impact on learned parameters across different search settings and data sets. We show that by optimizing Percolator's original learning algorithm, l2-SVM-MFN, large-scale SVM learning requires nearly only a third of the original runtime. Furthermore, we show that by employing the widely used Trust Region Newton (TRON) algorithm instead of l2-SVM-MFN, large-scale Percolator SVM learning is reduced to nearly only a fifth of the original runtime. Importantly, these speedups only affect the speed at which Percolator converges to a global solution and do not alter recalibration performance. The upgraded versions of both l2-SVM-MFN and TRON are optimized within the Percolator codebase for multithreaded and single-thread use and are available under Apache license at bitbucket.org/jthalloran/percolator_upgrade .


Assuntos
Aprendizado de Máquina , Proteômica/métodos , Software , Algoritmos , Bases de Dados de Proteínas , Máquina de Vetores de Suporte , Fatores de Tempo
9.
J Proteome Res ; 14(11): 4662-73, 2015 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-26390080

RESUMO

The two key steps for analyzing proteomic data generated by high-resolution MS are database searching and postprocessing. While the two steps are interrelated, studies on their combinatory effects and the optimization of these procedures have not been adequately conducted. Here, we investigated the performance of three popular search engines (SEQUEST, Mascot, and MS Amanda) in conjunction with five filtering approaches, including respective score-based filtering, a group-based approach, local false discovery rate (LFDR), PeptideProphet, and Percolator. A total of eight data sets from various proteomes (e.g., E. coli, yeast, and human) produced by various instruments with high-accuracy survey scan (MS1) and high- or low-accuracy fragment ion scan (MS2) (LTQ-Orbitrap, Orbitrap-Velos, Orbitrap-Elite, Q-Exactive, Orbitrap-Fusion, and Q-TOF) were analyzed. It was found combinations involving Percolator achieved markedly more peptide and protein identifications at the same FDR level than the other 12 combinations for all data sets. Among these, combinations of SEQUEST-Percolator and MS Amanda-Percolator provided slightly better performances for data sets with low-accuracy MS2 (ion trap or IT) and high accuracy MS2 (Orbitrap or TOF), respectively, than did other methods. For approaches without Percolator, SEQUEST-group performs the best for data sets with MS2 produced by collision-induced dissociation (CID) and IT analysis; Mascot-LFDR gives more identifications for data sets generated by higher-energy collisional dissociation (HCD) and analyzed in Orbitrap (HCD-OT) and in Orbitrap Fusion (HCD-IT); MS Amanda-Group excels for the Q-TOF data set and the Orbitrap Velos HCD-OT data set. Therefore, if Percolator was not used, a specific combination should be applied for each type of data set. Moreover, a higher percentage of multiple-peptide proteins and lower variation of protein spectral counts were observed when analyzing technical replicates using Percolator-associated combinations; therefore, Percolator enhanced the reliability for both identification and quantification. The analyses were performed using the specific programs embedded in Proteome Discoverer, Scaffold, and an in-house algorithm (BuildSummary). These results provide valuable guidelines for the optimal interpretation of proteomic results and the development of fit-for-purpose protocols under different situations.


Assuntos
Algoritmos , Peptídeos/análise , Proteoma/análise , Proteômica/métodos , Ferramenta de Busca/métodos , Software , Linhagem Celular Tumoral , Bases de Dados de Proteínas , Escherichia coli/genética , Escherichia coli/metabolismo , Humanos , Proteoma/genética , Proteoma/metabolismo , Proteômica/instrumentação , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Espectrometria de Massas em Tandem
10.
Chemosphere ; 349: 140945, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38104736

RESUMO

Chalcopyrite is the most abundant Cu-sulfide and economically the most important copper mineral in the world. It is known to be recalcitrant in hydrometallurgical processing and therefore chalcopyrite bioleaching has been thoroughly studied for improvement of processing. In this study, the microbial diversity in 22 samples from the Sarcheshmeh copper mine in Iran was investigated via 16S rRNA gene sequencing. In total, 1063 species were recognized after metagenomic analysis including the ferrous iron- and sulfur-oxidizing acidophilic genera Acidithiobacillus, Leptospirillum, Sulfobacillus and Ferroplasma. Mesophilic as well as moderately thermophilic acidophilic ferrous iron- and sulfur-oxidizing microorganisms were enriched from these samples and bioleaching was studied in shake flask experiments using a chalcopyrite-containing ore sample from the same mine. These enrichment cultures were further used as inoculum for bioleaching experiments in percolation columns for simulating heap bioleaching. Addition of 100 mM NaCl to the bioleaching medium was assessed to improve the dissolution rate of chalcopyrite. For comparison, bioleaching in stirred tank reactors with a defined microbial consortium was carried out as well. While just maximal 32% copper could be extracted in the flask bioleaching experiments, 73% and 76% of copper recovery was recorded after 30 and 10 days bioleaching in columns and bioreactors, respectively. Based on the results, both, the application of moderately thermophilic acidophilic bacteria in stirred tank bioreactors, and natural enrichment cultures of mesoacidophiles, with addition of 100 mM NaCl in column percolators with agglomerated ore allowed for a robust chalcopyrite dissolution and copper recovery from Sarcheshmeh copper ore via bioleaching.


Assuntos
Cobre , Microbiota , RNA Ribossômico 16S/genética , Cloreto de Sódio , Reatores Biológicos/microbiologia , Ferro , Enxofre , Sulfetos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA