Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 81
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 579(7799): 409-414, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32188942

RESUMO

Plants are essential for life and are extremely diverse organisms with unique molecular capabilities1. Here we present a quantitative atlas of the transcriptomes, proteomes and phosphoproteomes of 30 tissues of the model plant Arabidopsis thaliana. Our analysis provides initial answers to how many genes exist as proteins (more than 18,000), where they are expressed, in which approximate quantities (a dynamic range of more than six orders of magnitude) and to what extent they are phosphorylated (over 43,000 sites). We present examples of how the data may be used, such as to discover proteins that are translated from short open-reading frames, to uncover sequence motifs that are involved in the regulation of protein production, and to identify tissue-specific protein complexes or phosphorylation-mediated signalling events. Interactive access to this resource for the plant community is provided by the ProteomicsDB and ATHENA databases, which include powerful bioinformatics tools to explore and characterize Arabidopsis proteins, their modifications and interactions.


Assuntos
Proteínas de Arabidopsis/análise , Proteínas de Arabidopsis/química , Arabidopsis/química , Espectrometria de Massas , Proteoma/análise , Proteoma/química , Proteômica , Motivos de Aminoácidos , Arabidopsis/anatomia & histologia , Arabidopsis/genética , Arabidopsis/metabolismo , Proteínas de Arabidopsis/biossíntese , Proteínas de Arabidopsis/genética , Bases de Dados de Proteínas , Conjuntos de Dados como Assunto , Regulação da Expressão Gênica de Plantas , Anotação de Sequência Molecular , Fases de Leitura Aberta , Especificidade de Órgãos , Fosfoproteínas/análise , Fosfoproteínas/química , Fosfoproteínas/genética , Fosforilação , Proteoma/biossíntese , Proteoma/genética , RNA Mensageiro/análise , RNA Mensageiro/biossíntese , RNA Mensageiro/genética , Transcriptoma
2.
Mol Cell Proteomics ; 23(7): 100798, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38871251

RESUMO

Rescoring of peptide spectrum matches originating from database search engines enabled by peptide property predictors is exceeding the performance of peptide identification from traditional database search engines. In contrast to the peptide spectrum match scores calculated by traditional database search engines, rescoring peptide spectrum matches generates scores based on comparing observed and predicted peptide properties, such as fragment ion intensities and retention times. These newly generated scores enable a more efficient discrimination between correct and incorrect peptide spectrum matches. This approach was shown to lead to substantial improvements in the number of confidently identified peptides, facilitating the analysis of challenging datasets in various fields such as immunopeptidomics, metaproteomics, proteogenomics, and single-cell proteomics. In this review, we summarize the key elements leading up to the recent introduction of multiple data-driven rescoring pipelines. We provide an overview of relevant post-processing rescoring tools, introduce prominent data-driven rescoring pipelines for various applications, and highlight limitations, opportunities, and future perspectives of this approach and its impact on mass spectrometry-based proteomics.


Assuntos
Peptídeos , Proteômica , Proteômica/métodos , Peptídeos/metabolismo , Peptídeos/química , Humanos , Bases de Dados de Proteínas , Espectrometria de Massas/métodos , Ferramenta de Busca
3.
Nucleic Acids Res ; 52(17): 10144-10160, 2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39175109

RESUMO

Most heritable diseases are polygenic. To comprehend the underlying genetic architecture, it is crucial to discover the clinically relevant epistatic interactions (EIs) between genomic single nucleotide polymorphisms (SNPs) (1-3). Existing statistical computational methods for EI detection are mostly limited to pairs of SNPs due to the combinatorial explosion of higher-order EIs. With NeEDL (network-based epistasis detection via local search), we leverage network medicine to inform the selection of EIs that are an order of magnitude more statistically significant compared to existing tools and consist, on average, of five SNPs. We further show that this computationally demanding task can be substantially accelerated once quantum computing hardware becomes available. We apply NeEDL to eight different diseases and discover genes (affected by EIs of SNPs) that are partly known to affect the disease, additionally, these results are reproducible across independent cohorts. EIs for these eight diseases can be interactively explored in the Epistasis Disease Atlas (https://epistasis-disease-atlas.com). In summary, NeEDL demonstrates the potential of seamlessly integrated quantum computing techniques to accelerate biomedical research. Our network medicine approach detects higher-order EIs with unprecedented statistical and biological evidence, yielding unique insights into polygenic diseases and providing a basis for the development of improved risk scores and combination therapies.


Assuntos
Epistasia Genética , Polimorfismo de Nucleotídeo Único , Humanos , Teoria Quântica , Herança Multifatorial/genética , Doença/genética , Biologia Computacional/métodos , Algoritmos , Predisposição Genética para Doença
4.
Nat Methods ; 19(7): 803-811, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35710609

RESUMO

The laboratory mouse ranks among the most important experimental systems for biomedical research and molecular reference maps of such models are essential informational tools. Here, we present a quantitative draft of the mouse proteome and phosphoproteome constructed from 41 healthy tissues and several lines of analyses exemplify which insights can be gleaned from the data. For instance, tissue- and cell-type resolved profiles provide protein evidence for the expression of 17,000 genes, thousands of isoforms and 50,000 phosphorylation sites in vivo. Proteogenomic comparison of mouse, human and Arabidopsis reveal common and distinct mechanisms of gene expression regulation and, despite many similarities, numerous differentially abundant orthologs that likely serve species-specific functions. We leverage the mouse proteome by integrating phenotypic drug (n > 400) and radiation response data with the proteomes of 66 pancreatic ductal adenocarcinoma (PDAC) cell lines to reveal molecular markers for sensitivity and resistance. This unique atlas complements other molecular resources for the mouse and can be explored online via ProteomicsDB and PACiFIC.


Assuntos
Arabidopsis , Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Animais , Arabidopsis/genética , Carcinoma Ductal Pancreático/metabolismo , Espectrometria de Massas , Camundongos , Neoplasias Pancreáticas/genética , Proteoma/análise
5.
Nat Chem Biol ; 2023 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-37904048

RESUMO

Medicinal chemistry has discovered thousands of potent protein and lipid kinase inhibitors. These may be developed into therapeutic drugs or chemical probes to study kinase biology. Because of polypharmacology, a large part of the human kinome currently lacks selective chemical probes. To discover such probes, we profiled 1,183 compounds from drug discovery projects in lysates of cancer cell lines using Kinobeads. The resulting 500,000 compound-target interactions are available in ProteomicsDB and we exemplify how this molecular resource may be used. For instance, the data revealed several hundred reasonably selective compounds for 72 kinases. Cellular assays validated GSK986310C as a candidate SYK (spleen tyrosine kinase) probe and X-ray crystallography uncovered the structural basis for the observed selectivity of the CK2 inhibitor GW869516X. Compounds targeting PKN3 were discovered and phosphoproteomics identified substrates that indicate target engagement in cells. We anticipate that this molecular resource will aid research in drug discovery and chemical biology.

6.
Proteomics ; 24(8): e2300112, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37672792

RESUMO

Machine learning (ML) and deep learning (DL) models for peptide property prediction such as Prosit have enabled the creation of high quality in silico reference libraries. These libraries are used in various applications, ranging from data-independent acquisition (DIA) data analysis to data-driven rescoring of search engine results. Here, we present Oktoberfest, an open source Python package of our spectral library generation and rescoring pipeline originally only available online via ProteomicsDB. Oktoberfest is largely search engine agnostic and provides access to online peptide property predictions, promoting the adoption of state-of-the-art ML/DL models in proteomics analysis pipelines. We demonstrate its ability to reproduce and even improve our results from previously published rescoring analyses on two distinct use cases. Oktoberfest is freely available on GitHub (https://github.com/wilhelm-lab/oktoberfest) and can easily be installed locally through the cross-platform PyPI Python package.


Assuntos
Proteômica , Software , Proteômica/métodos , Peptídeos , Algoritmos
7.
BMC Genomics ; 25(1): 619, 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38898442

RESUMO

Plant genomics plays a pivotal role in enhancing global food security and sustainability by offering innovative solutions for improving crop yield, disease resistance, and stress tolerance. As the number of sequenced genomes grows and the accuracy and contiguity of genome assemblies improve, structural annotation of plant genomes continues to be a significant challenge due to their large size, polyploidy, and rich repeat content. In this paper, we present an overview of the current landscape in crop genomics research, highlighting the diversity of genomic characteristics across various crop species. We also assessed the accuracy of popular gene prediction tools in identifying genes within crop genomes and examined the factors that impact their performance. Our findings highlight the strengths and limitations of BRAKER2 and Helixer as leading structural genome annotation tools and underscore the impact of genome complexity, fragmentation, and repeat content on their performance. Furthermore, we evaluated the suitability of the predicted proteins as a reliable search space in proteomics studies using mass spectrometry data. Our results provide valuable insights for future efforts to refine and advance the field of structural genome annotation.


Assuntos
Produtos Agrícolas , Genoma de Planta , Anotação de Sequência Molecular , Proteômica , Produtos Agrícolas/genética , Proteômica/métodos , Genômica/métodos , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo
8.
Anal Chem ; 96(40): 15829-15833, 2024 Oct 08.
Artigo em Inglês | MEDLINE | ID: mdl-39322219

RESUMO

Mass-spectrometry-based proteomics has advanced with the integration of experimental and predicted spectral libraries, which have significantly improved peptide identification in complex search spaces. However, challenges persist in distinguishing some peptides with close retention times and nearly identical fragmentation patterns. In this study, we conducted a theoretical assessment to quantify the prevalence of indistinguishable peptides within the human canonical proteome and immunopeptidome using state-of-the-art retention time and spectrum prediction models. By quantifying the proportion of peptides posing challenges to unequivocal identification, we set the theoretical nonaccessible portion within a given proteome, and underscore the effectiveness of contemporary analytical methodologies in resolving the complexity of the human proteome and immunopeptidome via mass spectrometry.


Assuntos
Espectrometria de Massas , Peptídeos , Proteômica , Proteômica/métodos , Humanos , Peptídeos/análise , Peptídeos/química , Espectrometria de Massas/métodos , Proteoma/análise
9.
Nat Methods ; 18(6): 604-617, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34099939

RESUMO

Single-cell profiling methods have had a profound impact on the understanding of cellular heterogeneity. While genomes and transcriptomes can be explored at the single-cell level, single-cell profiling of proteomes is not yet established. Here we describe new single-molecule protein sequencing and identification technologies alongside innovations in mass spectrometry that will eventually enable broad sequence coverage in single-cell profiling. These technologies will in turn facilitate biological discovery and open new avenues for ultrasensitive disease diagnostics.


Assuntos
Análise de Sequência de Proteína/métodos , Imagem Individual de Molécula/métodos , Espectrometria de Massas/métodos , Nanotecnologia , Proteínas/química , Proteômica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
10.
Nat Chem Biol ; 18(8): 812-820, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35484434

RESUMO

Drugs that target histone deacetylase (HDAC) entered the pharmacopoeia in the 2000s. However, some enigmatic phenotypes suggest off-target engagement. Here, we developed a quantitative chemical proteomics assay using immobilized HDAC inhibitors and mass spectrometry that we deployed to establish the target landscape of 53 drugs. The assay covers 9 of the 11 human zinc-dependent HDACs, questions the reported selectivity of some widely-used molecules (notably for HDAC6) and delineates how the composition of HDAC complexes influences drug potency. Unexpectedly, metallo-ß-lactamase domain-containing protein 2 (MBLAC2) featured as a frequent off-target of hydroxamate drugs. This poorly characterized palmitoyl-CoA hydrolase is inhibited by 24 HDAC inhibitors at low nanomolar potency. MBLAC2 enzymatic inhibition and knockdown led to the accumulation of extracellular vesicles. Given the importance of extracellular vesicle biology in neurological diseases and cancer, this HDAC-independent drug effect may qualify MBLAC2 as a target for drug discovery.


Assuntos
Histona Desacetilases , Neoplasias , Descoberta de Drogas , Inibidores de Histona Desacetilases/química , Inibidores de Histona Desacetilases/farmacologia , Histona Desacetilases/metabolismo , Humanos , Ácidos Hidroxâmicos/química
11.
Mol Cell Proteomics ; 21(12): 100437, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36328188

RESUMO

Estimating false discovery rates (FDRs) of protein identification continues to be an important topic in mass spectrometry-based proteomics, particularly when analyzing very large datasets. One performant method for this purpose is the Picked Protein FDR approach which is based on a target-decoy competition strategy on the protein level that ensures that FDRs scale to large datasets. Here, we present an extension to this method that can also deal with protein groups, that is, proteins that share common peptides such as protein isoforms of the same gene. To obtain well-calibrated FDR estimates that preserve protein identification sensitivity, we introduce two novel ideas. First, the picked group target-decoy and second, the rescued subset grouping strategies. Using entrapment searches and simulated data for validation, we demonstrate that the new Picked Protein Group FDR method produces accurate protein group-level FDR estimates regardless of the size of the data set. The validation analysis also uncovered that applying the commonly used Occam's razor principle leads to anticonservative FDR estimates for large datasets. This is not the case for the Picked Protein Group FDR method. Reanalysis of deep proteomes of 29 human tissues showed that the new method identified up to 4% more protein groups than MaxQuant. Applying the method to the reanalysis of the entire human section of ProteomicsDB led to the identification of 18,000 protein groups at 1% protein group-level FDR. The analysis also showed that about 1250 genes were represented by ≥2 identified protein groups. To make the method accessible to the proteomics community, we provide a software tool including a graphical user interface that enables merging results from multiple MaxQuant searches into a single list of identified and quantified protein groups.


Assuntos
Peptídeos , Espectrometria de Massas em Tandem , Humanos , Espectrometria de Massas em Tandem/métodos , Bases de Dados de Proteínas , Software , Proteoma , Algoritmos
12.
Mol Cell Proteomics ; 21(8): 100238, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35462064

RESUMO

Isobaric stable isotope labeling techniques such as tandem mass tags (TMTs) have become popular in proteomics because they enable the relative quantification of proteins with high precision from up to 18 samples in a single experiment. While missing values in peptide quantification are rare in a single TMT experiment, they rapidly increase when combining multiple TMT experiments. As the field moves toward analyzing ever higher numbers of samples, tools that reduce missing values also become more important for analyzing TMT datasets. To this end, we developed SIMSI-Transfer (Similarity-based Isobaric Mass Spectra 2 [MS2] Identification Transfer), a software tool that extends our previously developed software MaRaCluster (© Matthew The) by clustering similar tandem MS2 from multiple TMT experiments. SIMSI-Transfer is based on the assumption that similarity-clustered MS2 spectra represent the same peptide. Therefore, peptide identifications made by database searching in one TMT batch can be transferred to another TMT batch in which the same peptide was fragmented but not identified. To assess the validity of this approach, we tested SIMSI-Transfer on masked search engine identification results and recovered >80% of the masked identifications while controlling errors in the transfer procedure to below 1% false discovery rate. Applying SIMSI-Transfer to six published full proteome and phosphoproteome datasets from the Clinical Proteomic Tumor Analysis Consortium led to an increase of 26 to 45% of identified MS2 spectra with TMT quantifications. This significantly decreased the number of missing values across batches and, in turn, increased the number of peptides and proteins identified in all TMT batches by 43 to 56% and 13 to 16%, respectively.


Assuntos
Proteômica , Espectrometria de Massas em Tandem , Análise por Conglomerados , Marcação por Isótopo , Peptídeos , Proteoma , Software
13.
Nucleic Acids Res ; 50(D1): D1541-D1552, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34791421

RESUMO

ProteomicsDB (https://www.ProteomicsDB.org) is a multi-omics and multi-organism resource for life science research. In this update, we present our efforts to continuously develop and expand ProteomicsDB. The major focus over the last two years was improving the findability, accessibility, interoperability and reusability (FAIR) of the data as well as its implementation. For this purpose, we release a new application programming interface (API) that provides systematic access to essentially all data in ProteomicsDB. Second, we release a new open-source user interface (UI) and show the advantages the scientific community gains from such software. With the new interface, two new visualizations of protein primary, secondary and tertiary structure as well an updated spectrum viewer were added. Furthermore, we integrated ProteomicsDB with our deep-neural-network Prosit that can predict the fragmentation characteristics and retention time of peptides. The result is an automatic processing pipeline that can be used to reevaluate database search engine results stored in ProteomicsDB. In addition, we extended the data content with experiments investigating different human biology as well as a newly supported organism.


Assuntos
Bases de Dados de Proteínas , Proteínas/classificação , Proteômica/classificação , Software , Disciplinas das Ciências Biológicas , Humanos , Redes Neurais de Computação , Proteínas/química
14.
J Proteome Res ; 22(9): 2836-2846, 2023 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-37557900

RESUMO

Sample multiplexed quantitative proteomics assays have proved to be a highly versatile means to assay molecular phenotypes. Yet, stochastic precursor selection and precursor coisolation can dramatically reduce the efficiency of data acquisition and quantitative accuracy. To address this, intelligent data acquisition (IDA) strategies have recently been developed to improve instrument efficiency and quantitative accuracy for both discovery and targeted methods. Toward this end, we sought to develop and implement a new real-time spectral library searching (RTLS) workflow that could enable intelligent scan triggering and peak selection within milliseconds of scan acquisition. To ensure ease of use and general applicability, we built an application to read in diverse spectral libraries and file types from both empirical and predicted spectral libraries. We demonstrate that RTLS methods enable improved quantitation of multiplexed samples, particularly with consideration for quantitation from chimeric fragment spectra. We used RTLS to profile proteome responses to small molecule perturbations and were able to quantify up to 15% more significantly regulated proteins in half the gradient time compared to traditional methods. Taken together, the development of RTLS expands the IDA toolbox to improve instrument efficiency and quantitative accuracy for sample multiplexed analyses.


Assuntos
Peptídeos , Proteômica , Proteômica/métodos , Peptídeos/análise , Proteoma/análise , Biblioteca Gênica , Fluxo de Trabalho , Biblioteca de Peptídeos
15.
J Proteome Res ; 22(3): 681-696, 2023 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-36744821

RESUMO

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.


Assuntos
Aprendizado de Máquina , Proteômica , Proteômica/métodos , Algoritmos , Espectrometria de Massas
16.
Anal Chem ; 95(37): 13746-13749, 2023 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-37676919

RESUMO

Mass spectrometry coupled to liquid chromatography is one of the most powerful technologies for proteome quantification in biomedical samples. In peptide-centric workflows, protein mixtures are enzymatically digested to peptides prior their analysis. However, proteome-wide quantification studies rarely identify all potential peptides for any given protein, and targeted proteomics experiments focus on a set of peptides for the proteins of interest. Consequently, proteomics relies on the use of a limited subset of all possible peptides as proxies for protein quantitation. In this work, we evaluated the stability of the human proteotypic peptides during 21 days and trained a deep learning model to predict peptide stability directly from tryptic sequences, which together constitute a resource of broad interest to prioritize and select peptides in proteome quantification experiments.


Assuntos
Proteoma , Proteômica , Humanos , Peptídeos , Cromatografia Líquida , Espectrometria de Massas
17.
Nat Methods ; 17(5): 495-503, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32284610

RESUMO

We have used a mass spectrometry-based proteomic approach to compile an atlas of the thermal stability of 48,000 proteins across 13 species ranging from archaea to humans and covering melting temperatures of 30-90 °C. Protein sequence, composition and size affect thermal stability in prokaryotes and eukaryotic proteins show a nonlinear relationship between the degree of disordered protein structure and thermal stability. The data indicate that evolutionary conservation of protein complexes is reflected by similar thermal stability of their proteins, and we show examples in which genomic alterations can affect thermal stability. Proteins of the respiratory chain were found to be very stable in many organisms, and human mitochondria showed close to normal respiration at 46 °C. We also noted cell-type-specific effects that can affect protein stability or the efficacy of drugs. This meltome atlas broadly defines the proteome amenable to thermal profiling in biology and drug discovery and can be explored online at http://meltomeatlas.proteomics.wzw.tum.de:5003/ and http://www.proteomicsdb.org.


Assuntos
Regulação da Expressão Gênica , Células Procarióticas/metabolismo , Proteínas/química , Proteínas/metabolismo , Proteoma/análise , Temperatura de Transição , Animais , Complexo de Proteínas da Cadeia de Transporte de Elétrons/metabolismo , Humanos , Mitocôndrias/metabolismo , Estabilidade Proteica , Software , Especificidade da Espécie
18.
Mol Cell Proteomics ; 20: 100076, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33823297

RESUMO

Proteogenomics approaches often struggle with the distinction between true and false peptide-to-spectrum matches as the database size enlarges. However, features extracted from tandem mass spectrometry intensity predictors can enhance the peptide identification rate and can provide extra confidence for peptide-to-spectrum matching in a proteogenomics context. To that end, features from the spectral intensity pattern predictors MS2PIP and Prosit were combined with the canonical scores from MaxQuant in the Percolator postprocessing tool for protein sequence databases constructed out of ribosome profiling and nanopore RNA-Seq analyses. The presented results provide evidence that this approach enhances both the identification rate as well as the validation stringency in a proteogenomic setting.


Assuntos
Proteogenômica/métodos , Bases de Dados de Proteínas , Células HCT116 , Humanos , Aprendizado de Máquina , RNA-Seq , Ribossomos
19.
Proteomics ; 22(19-20): e2100257, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-35578405

RESUMO

Isobaric labeling increases the throughput of proteomics by enabling the parallel identification and quantification of peptides and proteins. Over the past decades, a variety of isobaric tags have been developed allowing the multiplexed analysis of up to 18 samples. However, experiments utilizing such tags often exhibit reduced identification rates and thus show decreased analytical depth. Re-scoring has been shown to rescue otherwise missed identifications but was not yet systematically applied on isobarically labeled data. Because iTRAQ 4/8-plex and the recently released TMTpro 16/18-plex share similar characteristics with TMT 6/10/11-plex, we hypothesized that Prosit-TMT, trained exclusively on 6/10/11-plex labeled peptides, may be applicable to these isobaric labeling strategies as well. To investigate this, we re-analyzed nine publicly available datasets covering iTRAQ and TMTpro labeling for samples with human and mouse origin. We highlight that Prosit-TMT shows remarkably good performance when comparing experimentally acquired and predicted fragmentation spectra (R of 0.84 - 0.9) and retention times (ΔRT95% of 3% - 10% gradient time) of peptides. Furthermore, re-scoring substantially increases the number of confidently identified spectra, peptides, and proteins.


Assuntos
Peptídeos , Proteômica , Humanos , Camundongos , Animais , Peptídeos/análise , Proteínas , Indicadores e Reagentes
20.
J Proteome Res ; 21(5): 1359-1364, 2022 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-35413196

RESUMO

Machine learning has been an integral part of interpreting data from mass spectrometry (MS)-based proteomics for a long time. Relatively recently, a machine-learning structure appeared successful in other areas of bioinformatics, Transformers. Furthermore, the implementation of Transformers within bioinformatics has become relatively convenient due to transfer learning, i.e., adapting a network trained for other tasks to new functionality. Transfer learning makes these relatively large networks more accessible as it generally requires less data, and the training time improves substantially. We implemented a Transformer based on the pretrained model TAPE to predict MS2 intensities. TAPE is a general model trained to predict missing residues from protein sequences. Despite being trained for a different task, we could modify its behavior by adding a prediction head at the end of the TAPE model and fine-tune it using the spectrum intensity from the training set to the well-known predictor Prosit. We demonstrate that the predictor, which we call Prosit Transformer, outperforms the recurrent neural-network-based predictor Prosit, increasing the median angular similarity on its hold-out set from 0.908 to 0.929. We believe that Transformers will significantly increase prediction accuracy for other types of predictions within MS-based proteomics.


Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Sequência de Aminoácidos , Espectrometria de Massas , Proteômica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA