Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
J Proteome Res ; 22(3): 681-696, 2023 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-36744821

RESUMO

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.


Assuntos
Aprendizado de Máquina , Proteômica , Proteômica/métodos , Algoritmos , Espectrometria de Massas
2.
Bioinformatics ; 38(3): 875-877, 2022 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-34636883

RESUMO

MOTIVATION: Liquid-chromatography mass-spectrometry (LC-MS) is the established standard for analyzing the proteome in biological samples by identification and quantification of thousands of proteins. Machine learning (ML) promises to considerably improve the analysis of the resulting data, however, there is yet to be any tool that mediates the path from raw data to modern ML applications. More specifically, ML applications are currently hampered by three major limitations: (i) absence of balanced training data with large sample size; (ii) unclear definition of sufficiently information-rich data representations for e.g. peptide identification; (iii) lack of benchmarking of ML methods on specific LC-MS problems. RESULTS: We created the MS2AI pipeline that automates the process of gathering vast quantities of MS data for large-scale ML applications. The software retrieves raw data from either in-house sources or from the proteomics identifications database, PRIDE. Subsequently, the raw data are stored in a standardized format amenable for ML, encompassing MS1/MS2 spectra and peptide identifications. This tool bridges the gap between MS and AI, and to this effect we also present an ML application in the form of a convolutional neural network for the identification of oxidized peptides. AVAILABILITY AND IMPLEMENTATION: An open-source implementation of the software can be found at https://gitlab.com/roettgerlab/ms2ai. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Peptídeos , Espectrometria de Massas em Tandem , Cromatografia Líquida/métodos , Espectrometria de Massas em Tandem/métodos , Peptídeos/análise , Software , Proteoma/química
3.
Gigascience ; 122022 Dec 28.
Artigo em Inglês | MEDLINE | ID: mdl-37983748

RESUMO

BACKGROUND: Machine learning (ML) technologies, especially deep learning (DL), have gained increasing attention in predictive mass spectrometry (MS) for enhancing the data-processing pipeline from raw data analysis to end-user predictions and rescoring. ML models need large-scale datasets for training and repurposing, which can be obtained from a range of public data repositories. However, applying ML to public MS datasets on larger scales is challenging, as they vary widely in terms of data acquisition methods, biological systems, and experimental designs. RESULTS: We aim to facilitate ML efforts in MS data by conducting a systematic analysis of the potential sources of variability in public MS repositories. We also examine how these factors affect ML performance and perform a comprehensive transfer learning to evaluate the benefits of current best practice methods in the field for transfer learning. CONCLUSIONS: Our findings show significantly higher levels of homogeneity within a project than between projects, which indicates that it is important to construct datasets most closely resembling future test cases, as transferability is severely limited for unseen datasets. We also found that transfer learning, although it did increase model performance, did not increase model performance compared to a non-pretrained model.


Assuntos
Aprendizado de Máquina , Espectrometria de Massas em Tandem , Cromatografia Líquida
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa