Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
1.
Nucleic Acids Res ; 51(W1): W338-W342, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37140039

RESUMO

Interest in the use of machine learning for peptide fragmentation spectrum prediction has been strongly on the rise over the past years, especially for applications in challenging proteomics identification workflows such as immunopeptidomics and the full-proteome identification of data independent acquisition spectra. Since its inception, the MS²PIP peptide spectrum predictor has been widely used for various downstream applications, mostly thanks to its accuracy, ease-of-use, and broad applicability. We here present a thoroughly updated version of the MS²PIP web server, which includes new and more performant prediction models for both tryptic- and non-tryptic peptides, for immunopeptides, and for CID-fragmented TMT-labeled peptides. Additionally, we have also added new functionality to greatly facilitate the generation of proteome-wide predicted spectral libraries, requiring only a FASTA protein file as input. These libraries also include retention time predictions from DeepLC. Moreover, we now provide pre-built and ready-to-download spectral libraries for various model organisms in multiple DIA-compatible spectral library formats. Besides upgrading the back-end models, the user experience on the MS²PIP web server is thus also greatly enhanced, extending its applicability to new domains, including immunopeptidomics and MS3-based TMT quantification experiments. MS²PIP is freely available at https://iomics.ugent.be/ms2pip/.


Assuntos
Proteoma , Proteômica , Espectrometria de Massas em Tandem , Peptídeos/química
2.
Proteomics ; 24(8): e2300144, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38629965

RESUMO

In protein-RNA cross-linking mass spectrometry, UV or chemical cross-linking introduces stable bonds between amino acids and nucleic acids in protein-RNA complexes that are then analyzed and detected in mass spectra. This analytical tool delivers valuable information about RNA-protein interactions and RNA docking sites in proteins, both in vitro and in vivo. The identification of cross-linked peptides with oligonucleotides of different length leads to a combinatorial increase in search space. We demonstrate that the peptide retention time prediction tasks can be transferred to the task of cross-linked peptide retention time prediction using a simple amino acid composition encoding, yielding improved identification rates when the prediction error is included in rescoring. For the more challenging task of including fragment intensity prediction of cross-linked peptides in the rescoring, we obtain, on average, a similar improvement. Further improvement in the encoding and fine-tuning of retention time and intensity prediction models might lead to further gains, and merit further research.


Assuntos
Ácidos Nucleicos , RNA , Aminoácidos , Espectrometria de Massas , Peptídeos
3.
J Proteome Res ; 2024 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-38491990

RESUMO

Rescoring of peptide-spectrum matches (PSMs) has emerged as a standard procedure for the analysis of tandem mass spectrometry data. This emphasizes the need for software maintenance and continuous improvement for such algorithms. We introduce MS2Rescore 3.0, a versatile, modular, and user-friendly platform designed to increase peptide identifications. Researchers can install MS2Rescore across various platforms with minimal effort and benefit from a graphical user interface, a modular Python API, and extensive documentation. To showcase this new version, we connected MS2Rescore 3.0 with MS Amanda 3.0, a new release of the well-established search engine, addressing previous limitations on automatic rescoring. Among new features, MS Amanda now contains additional output columns that can be used for rescoring. The full potential of rescoring is best revealed when applied on challenging data sets. We therefore evaluated the performance of these two tools on publicly available single-cell data sets, where the number of PSMs was substantially increased, thereby demonstrating that MS2Rescore offers a powerful solution to boost peptide identifications. MS2Rescore's modular design and user-friendly interface make data-driven rescoring easily accessible, even for inexperienced users. We therefore expect the MS2Rescore to be a valuable tool for the wider proteomics community. MS2Rescore is available at https://github.com/compomics/ms2rescore.

4.
Nat Methods ; 18(11): 1363-1369, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34711972

RESUMO

The inclusion of peptide retention time prediction promises to remove peptide identification ambiguity in complex liquid chromatography-mass spectrometry identification workflows. However, due to the way peptides are encoded in current prediction models, accurate retention times cannot be predicted for modified peptides. This is especially problematic for fledgling open searches, which will benefit from accurate retention time prediction for modified peptides to reduce identification ambiguity. We present DeepLC, a deep learning peptide retention time predictor using peptide encoding based on atomic composition that allows the retention time of (previously unseen) modified peptides to be predicted accurately. We show that DeepLC performs similarly to current state-of-the-art approaches for unmodified peptides and, more importantly, accurately predicts retention times for modifications not seen during training. Moreover, we show that DeepLC's ability to predict retention times for any modification enables potentially incorrect identifications to be flagged in an open search of a wide variety of proteome data.


Assuntos
Algoritmos , Aprendizado Profundo , Fragmentos de Peptídeos/análise , Processamento de Proteína Pós-Traducional , Proteínas/análise , Proteínas/química , Proteoma/análise , Conjuntos de Dados como Assunto , Humanos , Fragmentos de Peptídeos/química , Mapeamento de Peptídeos
5.
Mol Cell Proteomics ; 21(8): 100266, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35803561

RESUMO

Immunopeptidomics aims to identify major histocompatibility complex (MHC)-presented peptides on almost all cells that can be used in anti-cancer vaccine development. However, existing immunopeptidomics data analysis pipelines suffer from the nontryptic nature of immunopeptides, complicating their identification. Previously, peak intensity predictions by MS2PIP and retention time predictions by DeepLC have been shown to improve tryptic peptide identifications when rescoring peptide-spectrum matches with Percolator. However, as MS2PIP was tailored toward tryptic peptides, we have here retrained MS2PIP to include nontryptic peptides. Interestingly, the new models not only greatly improve predictions for immunopeptides but also yield further improvements for tryptic peptides. We show that the integration of new MS2PIP models, DeepLC, and Percolator in one software package, MS2Rescore, increases spectrum identification rate and unique identified peptides with 46% and 36% compared to standard Percolator rescoring at 1% FDR. Moreover, MS2Rescore also outperforms the current state-of-the-art in immunopeptide-specific identification approaches. Altogether, MS2Rescore thus allows substantially improved identification of novel epitopes from existing immunopeptidomics workflows.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Algoritmos , Peptídeos , Proteínas
6.
J Proteome Res ; 22(2): 557-560, 2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36508242

RESUMO

A plethora of proteomics search engine output file formats are in circulation. This lack of standardized output files greatly complicates generic downstream processing of peptide-spectrum matches (PSMs) and PSM files. While standards exist to solve this problem, these are far from universally supported by search engines. Moreover, software libraries are available to read a selection of PSM file formats, but a package to parse PSM files into a unified data structure has been missing. Here, we present psm_utils, a Python package to read and write various PSM file formats and to handle peptidoforms, PSMs, and PSM lists in a unified and user-friendly Python-, command line-, and web-interface. psm_utils was developed with pragmatism and maintainability in mind, adhering to community standards and relying on existing packages where possible. The Python API and command line interface greatly facilitate handling various PSM file formats. Moreover, a user-friendly web application was built using psm_utils that allows anyone to interconvert PSM files and retrieve basic PSM statistics. psm_utils is freely available under the permissive Apache2 license at https://github.com/compomics/psm_utils.


Assuntos
Proteômica , Software , Proteômica/métodos , Peptídeos , Ferramenta de Busca
7.
J Proteome Res ; 22(3): 681-696, 2023 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-36744821

RESUMO

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.


Assuntos
Aprendizado de Máquina , Proteômica , Proteômica/métodos , Algoritmos , Espectrometria de Massas
8.
Mol Cell Proteomics ; 20: 100076, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33823297

RESUMO

Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting.


Assuntos
Proteogenômica/métodos , Bases de Dados de Proteínas , Células HCT116 , Humanos , Aprendizado de Máquina , RNA-Seq , Ribossomos
9.
J Proteome Res ; 21(4): 1204-1207, 2022 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-35119864

RESUMO

Machine learning is increasingly applied in proteomics and metabolomics to predict molecular structure, function, and physicochemical properties, including behavior in chromatography, ion mobility, and tandem mass spectrometry. These must be described in sufficient detail to apply or evaluate the performance of trained models. Here we look at and interpret the recently published and general DOME (Data, Optimization, Model, Evaluation) recommendations for conducting and reporting on machine learning in the specific context of proteomics and metabolomics.


Assuntos
Metabolômica , Proteômica , Aprendizado de Máquina , Metabolômica/métodos , Proteômica/métodos , Espectrometria de Massas em Tandem
10.
J Proteome Res ; 20(6): 3353-3364, 2021 06 04.
Artigo em Inglês | MEDLINE | ID: mdl-33998808

RESUMO

Discovery of variant peptides such as a single amino acid variant (SAAV) in shotgun proteomics data is essential for personalized proteomics. Both the resolution of shotgun proteomics methods and the search engines have improved dramatically, allowing for confident identification of SAAV peptides. However, it is not yet known if these methods are truly successful in accurately identifying SAAV peptides without prior genomic information in the search database. We studied this in unprecedented detail by exploiting publicly available long-read RNA sequences and shotgun proteomics data from the gold standard reference cell line NA12878. Searching spectra from this cell line with the state-of-the-art open modification search engine ionbot against carefully curated search databases resulted in 96.7% false-positive SAAVs and an 85% lower true positive rate than searching with peptide search databases that incorporate prior genetic information. While adding genetic variants to the search database remains indispensable for correct peptide identification, inclusion of long-read RNA sequences in the search database contributes only 0.3% new peptide identifications. These findings reveal the differences in SAAV detection that result from various approaches, providing guidance to researchers studying SAAV peptides and developers of peptide spectrum identification tools.


Assuntos
Proteogenômica , Aminoácidos , Bases de Dados de Proteínas , Proteoma/genética , Proteômica , Ferramenta de Busca
11.
Nucleic Acids Res ; 47(W1): W295-W299, 2019 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-31028400

RESUMO

MS²PIP is a data-driven tool that accurately predicts peak intensities for a given peptide's fragmentation mass spectrum. Since the release of the MS²PIP web server in 2015, we have brought significant updates to both the tool and the web server. In addition to the original models for CID and HCD fragmentation, we have added specialized models for the TripleTOF 5600+ mass spectrometer, for TMT-labeled peptides, for iTRAQ-labeled peptides, and for iTRAQ-labeled phosphopeptides. Because the fragmentation pattern is heavily altered in each of these cases, these additional models greatly improve the prediction accuracy for their corresponding data types. We have also substantially reduced the computational resources required to run MS²PIP, and have completely rebuilt the web server, which now allows predictions of up to 100 000 peptide sequences in a single request. The MS²PIP web server is freely available at https://iomics.ugent.be/ms2pip/.


Assuntos
Fragmentos de Peptídeos/análise , Fosfopeptídeos/análise , Proteômica/métodos , Software , Espectrometria de Massas em Tandem/estatística & dados numéricos , Sequência de Aminoácidos , Humanos , Internet , Proteômica/instrumentação , Coloração e Rotulagem/métodos
12.
Proteomics ; 20(21-22): e1900351, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32267083

RESUMO

A lot of energy in the field of proteomics is dedicated to the application of challenging experimental workflows, which include metaproteomics, proteogenomics, data independent acquisition (DIA), non-specific proteolysis, immunopeptidomics, and open modification searches. These workflows are all challenging because of ambiguity in the identification stage; they either expand the search space and thus increase the ambiguity of identifications, or, in the case of DIA, they generate data that is inherently more ambiguous. In this context, machine learning-based predictive models are now generating considerable excitement in the field of proteomics because these predictive models hold great potential to drastically reduce the ambiguity in the identification process of the above-mentioned workflows. Indeed, the field has already produced classical machine learning and deep learning models to predict almost every aspect of a liquid chromatography-mass spectrometry (LC-MS) experiment. Yet despite all the excitement, thorough integration of predictive models in these challenging LC-MS workflows is still limited, and further improvements to the modeling and validation procedures can still be made. Therefore, highly promising recent machine learning developments in proteomics are pointed out in this viewpoint, alongside some of the remaining challenges.


Assuntos
Aprendizado de Máquina , Proteômica , Fluxo de Trabalho , Cromatografia Líquida , Espectrometria de Massas
13.
Proteomics ; 20(3-4): e1900306, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31981311

RESUMO

Data-independent acquisition (DIA) generates comprehensive yet complex mass spectrometric data, which imposes the use of data-dependent acquisition (DDA) libraries for deep peptide-centric detection. Here, it is shown that DIA can be redeemed from this dependency by combining predicted fragment intensities and retention times with narrow window DIA. This eliminates variation in library building and omits stochastic sampling, finally making the DIA workflow fully deterministic. Especially for clinical proteomics, this has the potential to facilitate inter-laboratory comparison.


Assuntos
Cromatografia Líquida/métodos , Mineração de Dados/métodos , Espectrometria de Massas/métodos , Peptídeos/análise , Proteoma/análise , Proteômica/métodos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Células HeLa , Humanos , Biblioteca de Peptídeos , Software
14.
Anal Chem ; 92(9): 6571-6578, 2020 05 05.
Artigo em Inglês | MEDLINE | ID: mdl-32281370

RESUMO

Accurate prediction of liquid chromatographic retention times from small-molecule structures is useful for reducing experimental measurements and for improved identification in targeted and untargeted MS. However, different experimental setups (e.g., differences in columns, gradients, solvents, or stationary phase) have given rise to a multitude of prediction models that only predict accurate retention times for a specific experimental setup. In practice this typically results in the fitting of a new predictive model for each specific type of setup, which is not only inefficient but also requires substantial prior data to be accumulated on each such setup. Here we introduce the concept of generalized calibration, which is capable of the straightforward mapping of retention time models between different experimental setups. This concept builds on the database-controlled calibration approach implemented in PredRet and fits calibration curves on predicted retention times instead of only on observed retention times. We show that this approach results in substantially higher accuracy of elution-peak prediction than is achieved by setup-specific models.

15.
Bioinformatics ; 35(24): 5243-5248, 2019 12 15.
Artigo em Inglês | MEDLINE | ID: mdl-31077310

RESUMO

MOTIVATION: The use of post-processing tools to maximize the information gained from a proteomics search engine is widely accepted and used by the community, with the most notable example being Percolator-a semi-supervised machine learning model which learns a new scoring function for a given dataset. The usage of such tools is however bound to the search engine's scoring scheme, which doesn't always make full use of the intensity information present in a spectrum. We aim to show how this tool can be applied in such a way that maximizes the use of spectrum intensity information by leveraging another machine learning-based tool, MS2PIP. MS2PIP predicts fragment ion peak intensities. RESULTS: We show how comparing predicted intensities to annotated experimental spectra by calculating direct similarity metrics provides enough information for a tool such as Percolator to accurately separate two classes of peptide-to-spectrum matches. This approach allows using more information out of the data (compared with simpler intensity based metrics, like peak counting or explained intensities summing) while maintaining control of statistics such as the false discovery rate. AVAILABILITY AND IMPLEMENTATION: All of the code is available online at https://github.com/compomics/ms2rescore. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteômica , Ferramenta de Busca , Algoritmos , Bases de Dados de Proteínas , Software
16.
Anal Chem ; 91(5): 3694-3703, 2019 03 05.
Artigo em Inglês | MEDLINE | ID: mdl-30702864

RESUMO

Liquid chromatography is a core component of almost all mass spectrometric analyses of (bio)molecules. Because of the high-throughput nature of mass spectrometric analyses, the interpretation of these chromatographic data increasingly relies on informatics solutions that attempt to predict an analyte's retention time. The key components of such predictive algorithms are the features these are supplies with, and the actual machine learning algorithm used to fit the model parameters. Therefore, we have evaluated the performance of seven machine learning algorithms on 36 distinct metabolomics data sets, using two distinct feature sets. Interestingly, the results show that no single learning algorithm performs optimally for all data sets, with different types of algorithms achieving top performance for different types of analytes or different protocols. Our results thus show that an evaluation of machine learning algorithms for retention time prediction is needed to find a suitable algorithm for specific analytes or protocols. Importantly, however, our results also show that blending different types of models together decreases the error on outliers, indicating that the combination of several approaches holds substantial promise for the development of more generic, high-performing algorithms.


Assuntos
Algoritmos , Cromatografia Líquida/métodos , Aprendizado de Máquina/normas , Modelos Teóricos , Conjuntos de Dados como Assunto , Espectrometria de Massas
17.
Anal Chem ; 90(19): 11636-11642, 2018 10 02.
Artigo em Inglês | MEDLINE | ID: mdl-30188119

RESUMO

When analyzing mass spectrometry imaging data sets, assigning a molecule to each of the thousands of generated images is a very complex task. Recent efforts have taken lessons from (tandem) mass spectrometry proteomics and applied them to imaging mass spectrometry metabolomics, with good results. Our goal is to go a step further in this direction and apply a well established, data-driven method to improve the results obtained from an annotation engine. By using a data-driven rescoring strategy, we are able to consistently improve the sensitivity of the annotation engine while maintaining control of statistics like estimated rate of false discoveries. All the code necessary to run a search and extract the additional features can be found at https://github.com/anasilviacs/sm-engine and to rescore the results from a search in https://github.com/anasilviacs/rescore-metabolites .

18.
Bioinformatics ; 33(9): 1424-1425, 2017 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-28453684

RESUMO

Summary: Protein-protein interaction (PPI) studies have dramatically expanded our knowledge about cellular behaviour and development in different conditions. A multitude of high-throughput PPI techniques have been developed to achieve proteome-scale coverage for PPI studies, including the microarray based Mammalian Protein-Protein Interaction Trap (MAPPIT) system. Because such high-throughput techniques typically report thousands of interactions, managing and analysing the large amounts of acquired data is a challenge. We have therefore built the MAPPIT cell microArray Protein Protein Interaction-Data management & Analysis Tool (MAPPI-DAT) as an automated data management and analysis tool for MAPPIT cell microarray experiments. MAPPI-DAT stores the experimental data and metadata in a systematic and structured way, automates data analysis and interpretation, and enables the meta-analysis of MAPPIT cell microarray data across all stored experiments. Availability and Implementation: MAPPI-DAT is developed in Python, using R for data analysis and MySQL as data management system. MAPPI-DAT is cross-platform and can be ran on Microsoft Windows, Linux and OS X/macOS. The source code and a Microsoft Windows executable are freely available under the permissive Apache2 open source license at https://github.com/compomics/MAPPI-DAT. Contact: jan.tavernier@vib-ugent.be or lennart.martens@vib-ugent.be. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Análise Serial de Proteínas/métodos , Mapeamento de Interação de Proteínas/métodos , Software , Animais , Ensaios de Triagem em Larga Escala/métodos , Humanos , Mamíferos/metabolismo
19.
Nucleic Acids Res ; 43(W1): W326-30, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-25990723

RESUMO

We present an MS(2) peak intensity prediction server that computes MS(2) charge 2+ and 3+ spectra from peptide sequences for the most common fragment ions. The server integrates the Unimod public domain post-translational modification database for modified peptides. The prediction model is an improvement of the previously published MS(2)PIP model for Orbitrap-LTQ CID spectra. Predicted MS(2) spectra can be downloaded as a spectrum file and can be visualized in the browser for comparisons with observations. In addition, we added prediction models for HCD fragmentation (Q-Exactive Orbitrap) and show that these models compute accurate intensity predictions on par with CID performance. We also show that training prediction models for CID and HCD separately improves the accuracy for each fragmentation method. The MS(2)PIP prediction server is accessible from http://iomics.ugent.be/ms2pip.


Assuntos
Peptídeos/química , Software , Espectrometria de Massas em Tandem/métodos , Internet , Proteômica/métodos
20.
J Proteome Res ; 15(6): 1963-70, 2016 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-27089233

RESUMO

Shotgun proteomics experiments often take the form of a differential analysis, where two or more samples are compared against each other. The objective is to identify proteins that are either unique to a specific sample or a set of samples (qualitative differential proteomics), or that are significantly differentially expressed in one or more samples (quantitative differential proteomics). However, the success depends on the availability of a reliable protein sequence database for each sample. To perform such an analysis in the absence of a database, we here propose a novel, generic pipeline comprising an adapted spectral similarity score derived from database search algorithms that compares samples at the spectrum level to detect unique spectra. We applied our pipeline to compare two parasitic tapeworms: Taenia solium and Taenia hydatigena, of which only the former poses a threat to humans. Furthermore, because the genome of T. solium recently became available, we were able to prove the effectiveness and reliability of our pipeline a posteriori.


Assuntos
Proteômica/métodos , Taenia/química , Algoritmos , Animais , Bases de Dados de Proteínas , Genoma , Especificidade da Espécie , Espectrometria de Massas em Tandem , Fluxo de Trabalho
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA