Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
J Proteome Res ; 22(3): 681-696, 2023 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-36744821

RESUMO

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.


Assuntos
Aprendizado de Máquina , Proteômica , Proteômica/métodos , Algoritmos , Espectrometria de Massas
2.
J Proteome Res ; 22(2): 632-636, 2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36693629

RESUMO

Data set acquisition and curation are often the most difficult and time-consuming parts of a machine learning endeavor. This is especially true for proteomics-based liquid chromatography (LC) coupled to mass spectrometry (MS) data sets, due to the high levels of data reduction that occur between raw data and machine learning-ready data. Since predictive proteomics is an emerging field, when predicting peptide behavior in LC-MS setups, each lab often uses unique and complex data processing pipelines in order to maximize performance, at the cost of accessibility and reproducibility. For this reason we introduce ProteomicsML, an online resource for proteomics-based data sets and tutorials across most of the currently explored physicochemical peptide properties. This community-driven resource makes it simple to access data in easy-to-process formats, and contains easy-to-follow tutorials that allow new users to interact with even the most advanced algorithms in the field. ProteomicsML provides data sets that are useful for comparing state-of-the-art machine learning algorithms, as well as providing introductory material for teachers and newcomers to the field alike. The platform is freely available at https://www.proteomicsml.org/, and we welcome the entire proteomics community to contribute to the project at https://github.com/ProteomicsML/ProteomicsML.


Assuntos
Algoritmos , Proteômica , Proteômica/métodos , Reprodutibilidade dos Testes , Peptídeos/análise , Espectrometria de Massas/métodos , Software
4.
Nat Commun ; 12(1): 3346, 2021 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-34099720

RESUMO

Characterizing the human leukocyte antigen (HLA) bound ligandome by mass spectrometry (MS) holds great promise for developing vaccines and drugs for immune-oncology. Still, the identification of non-tryptic peptides presents substantial computational challenges. To address these, we synthesized and analyzed >300,000 peptides by multi-modal LC-MS/MS within the ProteomeTools project representing HLA class I & II ligands and products of the proteases AspN and LysN. The resulting data enabled training of a single model using the deep learning framework Prosit, allowing the accurate prediction of fragment ion spectra for tryptic and non-tryptic peptides. Applying Prosit demonstrates that the identification of HLA peptides can be improved up to 7-fold, that 87% of the proposed proteasomally spliced HLA peptides may be incorrect and that dozens of additional immunogenic neo-epitopes can be identified from patient tumors in published data. Together, the provided peptides, spectra and computational tools substantially expand the analytical depth of immunopeptidomics workflows.


Assuntos
Aprendizado Profundo , Peptídeos/imunologia , Espectrometria de Massas em Tandem/métodos , Linhagem Celular , Epitopos , Proteínas da Matriz Extracelular/metabolismo , Antígenos HLA/imunologia , Antígenos de Histocompatibilidade Classe I/metabolismo , Antígenos de Histocompatibilidade Classe II/metabolismo , Humanos , Ligantes , Espectrometria de Massas , Medicina Molecular , Peptídeos/metabolismo , Proteômica
5.
Rapid Commun Mass Spectrom ; : e9128, 2021 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-34015160

RESUMO

Database search engines for bottom-up proteomics largely ignore peptide fragment ion intensities during the automated scoring of tandem mass spectra against protein databases. Recent advances in deep learning allow the accurate prediction of peptide fragment ion intensities. Using these predictions to calculate additional intensity-based scores helps to overcome this drawback. Here, we describe a processing workflow termed INFERYS™ rescoring for the intensity-based rescoring of Sequest HT search engine results in Thermo Scientific™ Proteome Discoverer™ 2.5 software. The workflow is based on the deep learning platform INFERYS capable of predicting fragment ion intensities, which runs on personal computers without the need for graphics processing units. This workflow calculates intensity-based scores comparing peptide spectrum matches from Sequest HT and predicted spectra. Resulting scores are combined with classical search engine scores for input to the false discovery rate estimation tool Percolator. We demonstrate the merits of this approach by analyzing a classical HeLa standard sample and exemplify how this workflow leads to a better separation of target and decoy identifications, in turn resulting in increased peptide spectrum match, peptide and protein identification numbers. On an immunopeptidome dataset, this workflow leads to a 50% increase in identified peptides, emphasizing the advantage of intensity-based scores when analyzing low-intensity spectra or analytes with very similar physicochemical properties that require vast search spaces. Overall, the end-to-end integration of INFERYS rescoring enables simple and easy access to a powerful enhancement to classical database search engines, promising a deeper, more confident and more comprehensive analysis of proteomic data from any organism by unlocking the intensity dimension of tandem mass spectra for identification and more confident scoring.

6.
Mol Cell Proteomics ; 20: 100076, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33823297

RESUMO

Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting.


Assuntos
Proteogenômica/métodos , Bases de Dados de Proteínas , Células HCT116 , Humanos , Aprendizado de Máquina , RNA-Seq , Ribossomos
7.
Nat Commun ; 11(1): 1548, 2020 03 25.
Artigo em Inglês | MEDLINE | ID: mdl-32214105

RESUMO

Data-independent acquisition approaches typically rely on experiment-specific spectrum libraries, requiring offline fractionation and tens to hundreds of injections. We demonstrate a library generation workflow that leverages fragmentation and retention time prediction to build libraries containing every peptide in a proteome, and then refines those libraries with empirical data. Our method specifically enables rapid, experiment-specific library generation for non-model organisms, which we demonstrate using the malaria parasite Plasmodium falciparum, and non-canonical databases, which we show by detecting missense variants in HeLa.


Assuntos
Cromatografia Líquida/métodos , Peptídeos/análise , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Algoritmos , Bases de Dados de Proteínas , Células HeLa , Humanos , Biblioteca de Peptídeos , Peptídeos/química , Proteoma/análise , Proteoma/química , Reprodutibilidade dos Testes , Fluxo de Trabalho
8.
Nucleic Acids Res ; 48(D1): D1153-D1163, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31665479

RESUMO

ProteomicsDB (https://www.ProteomicsDB.org) started as a protein-centric in-memory database for the exploration of large collections of quantitative mass spectrometry-based proteomics data. The data types and contents grew over time to include RNA-Seq expression data, drug-target interactions and cell line viability data. In this manuscript, we summarize new developments since the previous update that was published in Nucleic Acids Research in 2017. Over the past two years, we have enriched the data content by additional datasets and extended the platform to support protein turnover data. Another important new addition is that ProteomicsDB now supports the storage and visualization of data collected from other organisms, exemplified by Arabidopsis thaliana. Due to the generic design of ProteomicsDB, all analytical features available for the original human resource seamlessly transfer to other organisms. Furthermore, we introduce a new service in ProteomicsDB which allows users to upload their own expression datasets and analyze them alongside with data stored in ProteomicsDB. Initially, users will be able to make use of this feature in the interactive heat map functionality as well as the drug sensitivity prediction, but ultimately will be able to use all analytical features of ProteomicsDB in this way.


Assuntos
Disciplinas das Ciências Biológicas , Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteômica/métodos , Pesquisa , Descoberta de Drogas , Software , Interface Usuário-Computador , Navegador
9.
Mol Cell Proteomics ; 18(8 suppl 1): S126-S140, 2019 08 09.
Artigo em Inglês | MEDLINE | ID: mdl-31040227

RESUMO

PROTEOFORMER is a pipeline that enables the automated processing of data derived from ribosome profiling (RIBO-seq, i.e. the sequencing of ribosome-protected mRNA fragments). As such, genome-wide ribosome occupancies lead to the delineation of data-specific translation product candidates and these can improve the mass spectrometry-based identification. Since its first publication, different upgrades, new features and extensions have been added to the PROTEOFORMER pipeline. Some of the most important upgrades include P-site offset calculation during mapping, comprehensive data pre-exploration, the introduction of two alternative proteoform calling strategies and extended pipeline output features. These novelties are illustrated by analyzing ribosome profiling data of human HCT116 and Jurkat data. The different proteoform calling strategies are used alongside one another and in the end combined together with reference sequences from UniProt. Matching mass spectrometry data are searched against this extended search space with MaxQuant. Overall, besides annotated proteoforms, this pipeline leads to the identification and validation of different categories of new proteoforms, including translation products of up- and downstream open reading frames, 5' and 3' extended and truncated proteoforms, single amino acid variants, splice variants and translation products of so-called noncoding regions. Further, proof-of-concept is reported for the improvement of spectrum matching by including Prosit, a deep neural network strategy that adds extra fragmentation spectrum intensity features to the analysis. In the light of ribosome profiling-driven proteogenomics, it is shown that this allows validating the spectrum matches of newly identified proteoforms with elevated stringency. These updates and novel conclusions provide new insights and lessons for the ribosome profiling-based proteogenomic research field. More practical information on the pipeline, raw code, the user manual (README) and explanations on the different modes of availability can be found at the GitHub repository of PROTEOFORMER: https://github.com/Biobix/proteoformer.


Assuntos
Proteogenômica/métodos , Ribossomos/metabolismo , Cromatografia Líquida , Células HCT116 , Humanos , Células Jurkat , Espectrometria de Massas em Tandem
10.
Nat Methods ; 16(6): 509-518, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31133760

RESUMO

In mass-spectrometry-based proteomics, the identification and quantification of peptides and proteins heavily rely on sequence database searching or spectral library matching. The lack of accurate predictive models for fragment ion intensities impairs the realization of the full potential of these approaches. Here, we extended the ProteomeTools synthetic peptide library to 550,000 tryptic peptides and 21 million high-quality tandem mass spectra. We trained a deep neural network, termed Prosit, resulting in chromatographic retention time and fragment ion intensity predictions that exceed the quality of the experimental data. Integrating Prosit into database search pipelines led to more identifications at >10× lower false discovery rates. We show the general applicability of Prosit by predicting spectra for proteases other than trypsin, generating spectral libraries for data-independent acquisition and improving the analysis of metaproteomes. Prosit is integrated into ProteomicsDB, allowing search result re-scoring and custom spectral library generation for any organism on the basis of peptide sequence alone.


Assuntos
Aprendizado Profundo , Redes Neurais de Computação , Fragmentos de Peptídeos/análise , Biblioteca de Peptídeos , Proteoma/análise , Software , Espectrometria de Massas em Tandem/métodos , Animais , Caenorhabditis elegans/metabolismo , Bases de Dados de Proteínas , Drosophila melanogaster/metabolismo , Células HEK293 , Humanos , Fragmentos de Peptídeos/metabolismo , Proteoma/metabolismo , Saccharomyces cerevisiae/metabolismo
11.
Nucleic Acids Res ; 46(D1): D1271-D1281, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29106664

RESUMO

ProteomicsDB (https://www.ProteomicsDB.org) is a protein-centric in-memory database for the exploration of large collections of quantitative mass spectrometry-based proteomics data. ProteomicsDB was first released in 2014 to enable the interactive exploration of the first draft of the human proteome. To date, it contains quantitative data from 78 projects totalling over 19k LC-MS/MS experiments. A standardized analysis pipeline enables comparisons between multiple datasets to facilitate the exploration of protein expression across hundreds of tissues, body fluids and cell lines. We recently extended the data model to enable the storage and integrated visualization of other quantitative omics data. This includes transcriptomics data from e.g. NCBI GEO, protein-protein interaction information from STRING, functional annotations from KEGG, drug-sensitivity/selectivity data from several public sources and reference mass spectra from the ProteomeTools project. The extended functionality transforms ProteomicsDB into a multi-purpose resource connecting quantification and meta-data for each protein. The rich user interface helps researchers to navigate all data sources in either a protein-centric or multi-protein-centric manner. Several options are available to download data manually, while our application programming interface enables accessing quantitative data systematically.


Assuntos
Bases de Dados de Proteínas , Espectrometria de Massas em Tandem , Sobrevivência Celular , Apresentação de Dados , Humanos , Internet , Preparações Farmacêuticas/metabolismo , Mapas de Interação de Proteínas , Proteínas/química , Proteínas/metabolismo , Proteômica
12.
Nat Methods ; 14(3): 259-262, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-28135259

RESUMO

We describe ProteomeTools, a project building molecular and digital tools from the human proteome to facilitate biomedical research. Here we report the generation and multimodal liquid chromatography-tandem mass spectrometry analysis of >330,000 synthetic tryptic peptides representing essentially all canonical human gene products, and we exemplify the utility of these data in several applications. The resource (available at http://www.proteometools.org) will be extended to >1 million peptides, and all data will be shared with the community via ProteomicsDB and ProteomeXchange.


Assuntos
Cromatografia Líquida/métodos , Proteoma/análise , Proteômica/métodos , Espectrometria de Massas em Tandem/métodos , Bases de Dados de Proteínas , Genoma Humano/genética , Humanos
13.
Nature ; 509(7502): 582-7, 2014 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-24870543

RESUMO

Proteomes are characterized by large protein-abundance differences, cell-type- and time-dependent expression patterns and post-translational modifications, all of which carry biological information that is not accessible by genomics or transcriptomics. Here we present a mass-spectrometry-based draft of the human proteome and a public, high-performance, in-memory database for real-time analysis of terabytes of big data, called ProteomicsDB. The information assembled from human tissues, cell lines and body fluids enabled estimation of the size of the protein-coding genome, and identified organ-specific proteins and a large number of translated lincRNAs (long intergenic non-coding RNAs). Analysis of messenger RNA and protein-expression profiles of human tissues revealed conserved control of protein abundance, and integration of drug-sensitivity data enabled the identification of proteins predicting resistance or sensitivity. The proteome profiles also hold considerable promise for analysing the composition and stoichiometry of protein complexes. ProteomicsDB thus enables navigation of proteomes, provides biological insight and fosters the development of proteomic technology.


Assuntos
Bases de Dados de Proteínas , Espectrometria de Massas , Proteoma/análise , Proteoma/química , Proteômica , Líquidos Corporais/química , Líquidos Corporais/metabolismo , Linhagem Celular , Perfilação da Expressão Gênica , Genoma Humano/genética , Humanos , Anotação de Sequência Molecular , Especificidade de Órgãos , Proteoma/genética , Proteoma/metabolismo , RNA Mensageiro/análise , RNA Mensageiro/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA