Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 51(W1): W338-W342, 2023 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-37140039

RESUMEN

Interest in the use of machine learning for peptide fragmentation spectrum prediction has been strongly on the rise over the past years, especially for applications in challenging proteomics identification workflows such as immunopeptidomics and the full-proteome identification of data independent acquisition spectra. Since its inception, the MS²PIP peptide spectrum predictor has been widely used for various downstream applications, mostly thanks to its accuracy, ease-of-use, and broad applicability. We here present a thoroughly updated version of the MS²PIP web server, which includes new and more performant prediction models for both tryptic- and non-tryptic peptides, for immunopeptides, and for CID-fragmented TMT-labeled peptides. Additionally, we have also added new functionality to greatly facilitate the generation of proteome-wide predicted spectral libraries, requiring only a FASTA protein file as input. These libraries also include retention time predictions from DeepLC. Moreover, we now provide pre-built and ready-to-download spectral libraries for various model organisms in multiple DIA-compatible spectral library formats. Besides upgrading the back-end models, the user experience on the MS²PIP web server is thus also greatly enhanced, extending its applicability to new domains, including immunopeptidomics and MS3-based TMT quantification experiments. MS²PIP is freely available at https://iomics.ugent.be/ms2pip/.


Asunto(s)
Proteoma , Proteómica , Espectrometría de Masas en Tándem , Péptidos/química
2.
Proteomics ; 24(8): e2300144, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38629965

RESUMEN

In protein-RNA cross-linking mass spectrometry, UV or chemical cross-linking introduces stable bonds between amino acids and nucleic acids in protein-RNA complexes that are then analyzed and detected in mass spectra. This analytical tool delivers valuable information about RNA-protein interactions and RNA docking sites in proteins, both in vitro and in vivo. The identification of cross-linked peptides with oligonucleotides of different length leads to a combinatorial increase in search space. We demonstrate that the peptide retention time prediction tasks can be transferred to the task of cross-linked peptide retention time prediction using a simple amino acid composition encoding, yielding improved identification rates when the prediction error is included in rescoring. For the more challenging task of including fragment intensity prediction of cross-linked peptides in the rescoring, we obtain, on average, a similar improvement. Further improvement in the encoding and fine-tuning of retention time and intensity prediction models might lead to further gains, and merit further research.


Asunto(s)
Ácidos Nucleicos , ARN , Aminoácidos , Espectrometría de Masas , Péptidos
3.
J Proteome Res ; 2024 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-38491990

RESUMEN

Rescoring of peptide-spectrum matches (PSMs) has emerged as a standard procedure for the analysis of tandem mass spectrometry data. This emphasizes the need for software maintenance and continuous improvement for such algorithms. We introduce MS2Rescore 3.0, a versatile, modular, and user-friendly platform designed to increase peptide identifications. Researchers can install MS2Rescore across various platforms with minimal effort and benefit from a graphical user interface, a modular Python API, and extensive documentation. To showcase this new version, we connected MS2Rescore 3.0 with MS Amanda 3.0, a new release of the well-established search engine, addressing previous limitations on automatic rescoring. Among new features, MS Amanda now contains additional output columns that can be used for rescoring. The full potential of rescoring is best revealed when applied on challenging data sets. We therefore evaluated the performance of these two tools on publicly available single-cell data sets, where the number of PSMs was substantially increased, thereby demonstrating that MS2Rescore offers a powerful solution to boost peptide identifications. MS2Rescore's modular design and user-friendly interface make data-driven rescoring easily accessible, even for inexperienced users. We therefore expect the MS2Rescore to be a valuable tool for the wider proteomics community. MS2Rescore is available at https://github.com/compomics/ms2rescore.

4.
Nat Methods ; 18(11): 1363-1369, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34711972

RESUMEN

The inclusion of peptide retention time prediction promises to remove peptide identification ambiguity in complex liquid chromatography-mass spectrometry identification workflows. However, due to the way peptides are encoded in current prediction models, accurate retention times cannot be predicted for modified peptides. This is especially problematic for fledgling open searches, which will benefit from accurate retention time prediction for modified peptides to reduce identification ambiguity. We present DeepLC, a deep learning peptide retention time predictor using peptide encoding based on atomic composition that allows the retention time of (previously unseen) modified peptides to be predicted accurately. We show that DeepLC performs similarly to current state-of-the-art approaches for unmodified peptides and, more importantly, accurately predicts retention times for modifications not seen during training. Moreover, we show that DeepLC's ability to predict retention times for any modification enables potentially incorrect identifications to be flagged in an open search of a wide variety of proteome data.


Asunto(s)
Algoritmos , Aprendizaje Profundo , Fragmentos de Péptidos/análisis , Procesamiento Proteico-Postraduccional , Proteínas/análisis , Proteínas/química , Proteoma/análisis , Conjuntos de Datos como Asunto , Humanos , Fragmentos de Péptidos/química , Mapeo Peptídico
5.
Mol Cell Proteomics ; 21(8): 100266, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35803561

RESUMEN

Immunopeptidomics aims to identify major histocompatibility complex (MHC)-presented peptides on almost all cells that can be used in anti-cancer vaccine development. However, existing immunopeptidomics data analysis pipelines suffer from the nontryptic nature of immunopeptides, complicating their identification. Previously, peak intensity predictions by MS2PIP and retention time predictions by DeepLC have been shown to improve tryptic peptide identifications when rescoring peptide-spectrum matches with Percolator. However, as MS2PIP was tailored toward tryptic peptides, we have here retrained MS2PIP to include nontryptic peptides. Interestingly, the new models not only greatly improve predictions for immunopeptides but also yield further improvements for tryptic peptides. We show that the integration of new MS2PIP models, DeepLC, and Percolator in one software package, MS2Rescore, increases spectrum identification rate and unique identified peptides with 46% and 36% compared to standard Percolator rescoring at 1% FDR. Moreover, MS2Rescore also outperforms the current state-of-the-art in immunopeptide-specific identification approaches. Altogether, MS2Rescore thus allows substantially improved identification of novel epitopes from existing immunopeptidomics workflows.


Asunto(s)
Proteómica , Espectrometría de Masas en Tándem , Algoritmos , Péptidos , Proteínas
6.
J Proteome Res ; 22(2): 557-560, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36508242

RESUMEN

A plethora of proteomics search engine output file formats are in circulation. This lack of standardized output files greatly complicates generic downstream processing of peptide-spectrum matches (PSMs) and PSM files. While standards exist to solve this problem, these are far from universally supported by search engines. Moreover, software libraries are available to read a selection of PSM file formats, but a package to parse PSM files into a unified data structure has been missing. Here, we present psm_utils, a Python package to read and write various PSM file formats and to handle peptidoforms, PSMs, and PSM lists in a unified and user-friendly Python-, command line-, and web-interface. psm_utils was developed with pragmatism and maintainability in mind, adhering to community standards and relying on existing packages where possible. The Python API and command line interface greatly facilitate handling various PSM file formats. Moreover, a user-friendly web application was built using psm_utils that allows anyone to interconvert PSM files and retrieve basic PSM statistics. psm_utils is freely available under the permissive Apache2 license at https://github.com/compomics/psm_utils.


Asunto(s)
Proteómica , Programas Informáticos , Proteómica/métodos , Péptidos , Motor de Búsqueda
7.
J Proteome Res ; 22(3): 681-696, 2023 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-36744821

RESUMEN

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.


Asunto(s)
Aprendizaje Automático , Proteómica , Proteómica/métodos , Algoritmos , Espectrometría de Masas
8.
Mol Cell Proteomics ; 20: 100076, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33823297

RESUMEN

Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting.


Asunto(s)
Proteogenómica/métodos , Bases de Datos de Proteínas , Células HCT116 , Humanos , Aprendizaje Automático , RNA-Seq , Ribosomas
9.
J Proteome Res ; 21(4): 1204-1207, 2022 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-35119864

RESUMEN

Machine learning is increasingly applied in proteomics and metabolomics to predict molecular structure, function, and physicochemical properties, including behavior in chromatography, ion mobility, and tandem mass spectrometry. These must be described in sufficient detail to apply or evaluate the performance of trained models. Here we look at and interpret the recently published and general DOME (Data, Optimization, Model, Evaluation) recommendations for conducting and reporting on machine learning in the specific context of proteomics and metabolomics.


Asunto(s)
Metabolómica , Proteómica , Aprendizaje Automático , Metabolómica/métodos , Proteómica/métodos , Espectrometría de Masas en Tándem
10.
J Proteome Res ; 20(6): 3353-3364, 2021 06 04.
Artículo en Inglés | MEDLINE | ID: mdl-33998808

RESUMEN

Discovery of variant peptides such as a single amino acid variant (SAAV) in shotgun proteomics data is essential for personalized proteomics. Both the resolution of shotgun proteomics methods and the search engines have improved dramatically, allowing for confident identification of SAAV peptides. However, it is not yet known if these methods are truly successful in accurately identifying SAAV peptides without prior genomic information in the search database. We studied this in unprecedented detail by exploiting publicly available long-read RNA sequences and shotgun proteomics data from the gold standard reference cell line NA12878. Searching spectra from this cell line with the state-of-the-art open modification search engine ionbot against carefully curated search databases resulted in 96.7% false-positive SAAVs and an 85% lower true positive rate than searching with peptide search databases that incorporate prior genetic information. While adding genetic variants to the search database remains indispensable for correct peptide identification, inclusion of long-read RNA sequences in the search database contributes only 0.3% new peptide identifications. These findings reveal the differences in SAAV detection that result from various approaches, providing guidance to researchers studying SAAV peptides and developers of peptide spectrum identification tools.


Asunto(s)
Proteogenómica , Aminoácidos , Bases de Datos de Proteínas , Proteoma/genética , Proteómica , Motor de Búsqueda
11.
Nucleic Acids Res ; 47(W1): W295-W299, 2019 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-31028400

RESUMEN

MS²PIP is a data-driven tool that accurately predicts peak intensities for a given peptide's fragmentation mass spectrum. Since the release of the MS²PIP web server in 2015, we have brought significant updates to both the tool and the web server. In addition to the original models for CID and HCD fragmentation, we have added specialized models for the TripleTOF 5600+ mass spectrometer, for TMT-labeled peptides, for iTRAQ-labeled peptides, and for iTRAQ-labeled phosphopeptides. Because the fragmentation pattern is heavily altered in each of these cases, these additional models greatly improve the prediction accuracy for their corresponding data types. We have also substantially reduced the computational resources required to run MS²PIP, and have completely rebuilt the web server, which now allows predictions of up to 100 000 peptide sequences in a single request. The MS²PIP web server is freely available at https://iomics.ugent.be/ms2pip/.


Asunto(s)
Fragmentos de Péptidos/análisis , Fosfopéptidos/análisis , Proteómica/métodos , Programas Informáticos , Espectrometría de Masas en Tándem/estadística & datos numéricos , Secuencia de Aminoácidos , Humanos , Internet , Proteómica/instrumentación , Coloración y Etiquetado/métodos
12.
Proteomics ; 20(21-22): e1900351, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-32267083

RESUMEN

A lot of energy in the field of proteomics is dedicated to the application of challenging experimental workflows, which include metaproteomics, proteogenomics, data independent acquisition (DIA), non-specific proteolysis, immunopeptidomics, and open modification searches. These workflows are all challenging because of ambiguity in the identification stage; they either expand the search space and thus increase the ambiguity of identifications, or, in the case of DIA, they generate data that is inherently more ambiguous. In this context, machine learning-based predictive models are now generating considerable excitement in the field of proteomics because these predictive models hold great potential to drastically reduce the ambiguity in the identification process of the above-mentioned workflows. Indeed, the field has already produced classical machine learning and deep learning models to predict almost every aspect of a liquid chromatography-mass spectrometry (LC-MS) experiment. Yet despite all the excitement, thorough integration of predictive models in these challenging LC-MS workflows is still limited, and further improvements to the modeling and validation procedures can still be made. Therefore, highly promising recent machine learning developments in proteomics are pointed out in this viewpoint, alongside some of the remaining challenges.


Asunto(s)
Aprendizaje Automático , Proteómica , Flujo de Trabajo , Cromatografía Liquida , Espectrometría de Masas
13.
Proteomics ; 20(3-4): e1900306, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-31981311

RESUMEN

Data-independent acquisition (DIA) generates comprehensive yet complex mass spectrometric data, which imposes the use of data-dependent acquisition (DDA) libraries for deep peptide-centric detection. Here, it is shown that DIA can be redeemed from this dependency by combining predicted fragment intensities and retention times with narrow window DIA. This eliminates variation in library building and omits stochastic sampling, finally making the DIA workflow fully deterministic. Especially for clinical proteomics, this has the potential to facilitate inter-laboratory comparison.


Asunto(s)
Cromatografía Liquida/métodos , Minería de Datos/métodos , Espectrometría de Masas/métodos , Péptidos/análisis , Proteoma/análisis , Proteómica/métodos , Biología Computacional/métodos , Bases de Datos de Proteínas , Células HeLa , Humanos , Biblioteca de Péptidos , Programas Informáticos
14.
Anal Chem ; 92(9): 6571-6578, 2020 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-32281370

RESUMEN

Accurate prediction of liquid chromatographic retention times from small-molecule structures is useful for reducing experimental measurements and for improved identification in targeted and untargeted MS. However, different experimental setups (e.g., differences in columns, gradients, solvents, or stationary phase) have given rise to a multitude of prediction models that only predict accurate retention times for a specific experimental setup. In practice this typically results in the fitting of a new predictive model for each specific type of setup, which is not only inefficient but also requires substantial prior data to be accumulated on each such setup. Here we introduce the concept of generalized calibration, which is capable of the straightforward mapping of retention time models between different experimental setups. This concept builds on the database-controlled calibration approach implemented in PredRet and fits calibration curves on predicted retention times instead of only on observed retention times. We show that this approach results in substantially higher accuracy of elution-peak prediction than is achieved by setup-specific models.

15.
Bioinformatics ; 35(24): 5243-5248, 2019 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-31077310

RESUMEN

MOTIVATION: The use of post-processing tools to maximize the information gained from a proteomics search engine is widely accepted and used by the community, with the most notable example being Percolator-a semi-supervised machine learning model which learns a new scoring function for a given dataset. The usage of such tools is however bound to the search engine's scoring scheme, which doesn't always make full use of the intensity information present in a spectrum. We aim to show how this tool can be applied in such a way that maximizes the use of spectrum intensity information by leveraging another machine learning-based tool, MS2PIP. MS2PIP predicts fragment ion peak intensities. RESULTS: We show how comparing predicted intensities to annotated experimental spectra by calculating direct similarity metrics provides enough information for a tool such as Percolator to accurately separate two classes of peptide-to-spectrum matches. This approach allows using more information out of the data (compared with simpler intensity based metrics, like peak counting or explained intensities summing) while maintaining control of statistics such as the false discovery rate. AVAILABILITY AND IMPLEMENTATION: All of the code is available online at https://github.com/compomics/ms2rescore. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteómica , Motor de Búsqueda , Algoritmos , Bases de Datos de Proteínas , Programas Informáticos
16.
Anal Chem ; 91(5): 3694-3703, 2019 03 05.
Artículo en Inglés | MEDLINE | ID: mdl-30702864

RESUMEN

Liquid chromatography is a core component of almost all mass spectrometric analyses of (bio)molecules. Because of the high-throughput nature of mass spectrometric analyses, the interpretation of these chromatographic data increasingly relies on informatics solutions that attempt to predict an analyte's retention time. The key components of such predictive algorithms are the features these are supplies with, and the actual machine learning algorithm used to fit the model parameters. Therefore, we have evaluated the performance of seven machine learning algorithms on 36 distinct metabolomics data sets, using two distinct feature sets. Interestingly, the results show that no single learning algorithm performs optimally for all data sets, with different types of algorithms achieving top performance for different types of analytes or different protocols. Our results thus show that an evaluation of machine learning algorithms for retention time prediction is needed to find a suitable algorithm for specific analytes or protocols. Importantly, however, our results also show that blending different types of models together decreases the error on outliers, indicating that the combination of several approaches holds substantial promise for the development of more generic, high-performing algorithms.


Asunto(s)
Algoritmos , Cromatografía Liquida/métodos , Aprendizaje Automático/normas , Modelos Teóricos , Conjuntos de Datos como Asunto , Espectrometría de Masas
17.
Anal Chem ; 90(19): 11636-11642, 2018 10 02.
Artículo en Inglés | MEDLINE | ID: mdl-30188119

RESUMEN

When analyzing mass spectrometry imaging data sets, assigning a molecule to each of the thousands of generated images is a very complex task. Recent efforts have taken lessons from (tandem) mass spectrometry proteomics and applied them to imaging mass spectrometry metabolomics, with good results. Our goal is to go a step further in this direction and apply a well established, data-driven method to improve the results obtained from an annotation engine. By using a data-driven rescoring strategy, we are able to consistently improve the sensitivity of the annotation engine while maintaining control of statistics like estimated rate of false discoveries. All the code necessary to run a search and extract the additional features can be found at https://github.com/anasilviacs/sm-engine and to rescore the results from a search in https://github.com/anasilviacs/rescore-metabolites .

18.
Bioinformatics ; 33(9): 1424-1425, 2017 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-28453684

RESUMEN

Summary: Protein-protein interaction (PPI) studies have dramatically expanded our knowledge about cellular behaviour and development in different conditions. A multitude of high-throughput PPI techniques have been developed to achieve proteome-scale coverage for PPI studies, including the microarray based Mammalian Protein-Protein Interaction Trap (MAPPIT) system. Because such high-throughput techniques typically report thousands of interactions, managing and analysing the large amounts of acquired data is a challenge. We have therefore built the MAPPIT cell microArray Protein Protein Interaction-Data management & Analysis Tool (MAPPI-DAT) as an automated data management and analysis tool for MAPPIT cell microarray experiments. MAPPI-DAT stores the experimental data and metadata in a systematic and structured way, automates data analysis and interpretation, and enables the meta-analysis of MAPPIT cell microarray data across all stored experiments. Availability and Implementation: MAPPI-DAT is developed in Python, using R for data analysis and MySQL as data management system. MAPPI-DAT is cross-platform and can be ran on Microsoft Windows, Linux and OS X/macOS. The source code and a Microsoft Windows executable are freely available under the permissive Apache2 open source license at https://github.com/compomics/MAPPI-DAT. Contact: jan.tavernier@vib-ugent.be or lennart.martens@vib-ugent.be. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Análisis por Matrices de Proteínas/métodos , Mapeo de Interacción de Proteínas/métodos , Programas Informáticos , Animales , Ensayos Analíticos de Alto Rendimiento/métodos , Humanos , Mamíferos/metabolismo
19.
Nucleic Acids Res ; 43(W1): W326-30, 2015 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-25990723

RESUMEN

We present an MS(2) peak intensity prediction server that computes MS(2) charge 2+ and 3+ spectra from peptide sequences for the most common fragment ions. The server integrates the Unimod public domain post-translational modification database for modified peptides. The prediction model is an improvement of the previously published MS(2)PIP model for Orbitrap-LTQ CID spectra. Predicted MS(2) spectra can be downloaded as a spectrum file and can be visualized in the browser for comparisons with observations. In addition, we added prediction models for HCD fragmentation (Q-Exactive Orbitrap) and show that these models compute accurate intensity predictions on par with CID performance. We also show that training prediction models for CID and HCD separately improves the accuracy for each fragmentation method. The MS(2)PIP prediction server is accessible from http://iomics.ugent.be/ms2pip.


Asunto(s)
Péptidos/química , Programas Informáticos , Espectrometría de Masas en Tándem/métodos , Internet , Proteómica/métodos
20.
J Proteome Res ; 15(6): 1963-70, 2016 06 03.
Artículo en Inglés | MEDLINE | ID: mdl-27089233

RESUMEN

Shotgun proteomics experiments often take the form of a differential analysis, where two or more samples are compared against each other. The objective is to identify proteins that are either unique to a specific sample or a set of samples (qualitative differential proteomics), or that are significantly differentially expressed in one or more samples (quantitative differential proteomics). However, the success depends on the availability of a reliable protein sequence database for each sample. To perform such an analysis in the absence of a database, we here propose a novel, generic pipeline comprising an adapted spectral similarity score derived from database search algorithms that compares samples at the spectrum level to detect unique spectra. We applied our pipeline to compare two parasitic tapeworms: Taenia solium and Taenia hydatigena, of which only the former poses a threat to humans. Furthermore, because the genome of T. solium recently became available, we were able to prove the effectiveness and reliability of our pipeline a posteriori.


Asunto(s)
Proteómica/métodos , Taenia/química , Algoritmos , Animales , Bases de Datos de Proteínas , Genoma , Especificidad de la Especie , Espectrometría de Masas en Tándem , Flujo de Trabajo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA