Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Nat Commun ; 15(1): 6427, 2024 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-39080256

RESUMO

A fundamental challenge in mass spectrometry-based proteomics is the identification of the peptide that generated each acquired tandem mass spectrum. Approaches that leverage known peptide sequence databases cannot detect unexpected peptides and can be impractical or impossible to apply in some settings. Thus, the ability to assign peptide sequences to tandem mass spectra without prior information-de novo peptide sequencing-is valuable for tasks including antibody sequencing, immunopeptidomics, and metaproteomics. Although many methods have been developed to address this problem, it remains an outstanding challenge in part due to the difficulty of modeling the irregular data structure of tandem mass spectra. Here, we describe Casanovo, a machine learning model that uses a transformer neural network architecture to translate the sequence of peaks in a tandem mass spectrum into the sequence of amino acids that comprise the generating peptide. We train a Casanovo model from 30 million labeled spectra and demonstrate that the model outperforms several state-of-the-art methods on a cross-species benchmark dataset. We also develop a version of Casanovo that is fine-tuned for non-enzymatic peptides. Finally, we demonstrate that Casanovo's superior performance improves the analysis of immunopeptidomics and metaproteomics experiments and allows us to delve deeper into the dark proteome.


Assuntos
Peptídeos , Proteômica , Espectrometria de Massas em Tandem , Peptídeos/química , Peptídeos/metabolismo , Espectrometria de Massas em Tandem/métodos , Proteômica/métodos , Redes Neurais de Computação , Aprendizado de Máquina , Humanos , Sequência de Aminoácidos , Análise de Sequência de Proteína/métodos , Bases de Dados de Proteínas , Algoritmos
2.
Nat Commun ; 15(1): 3956, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38730277

RESUMO

Immunopeptidomics is crucial for immunotherapy and vaccine development. Because the generation of immunopeptides from their parent proteins does not adhere to clear-cut rules, rather than being able to use known digestion patterns, every possible protein subsequence within human leukocyte antigen (HLA) class-specific length restrictions needs to be considered during sequence database searching. This leads to an inflation of the search space and results in lower spectrum annotation rates. Peptide-spectrum match (PSM) rescoring is a powerful enhancement of standard searching that boosts the spectrum annotation performance. We analyze 302,105 unique synthesized non-tryptic peptides from the ProteomeTools project on a timsTOF-Pro to generate a ground-truth dataset containing 93,227 MS/MS spectra of 74,847 unique peptides, that is used to fine-tune the deep learning-based fragment ion intensity prediction model Prosit. We demonstrate up to 3-fold improvement in the identification of immunopeptides, as well as increased detection of immunopeptides from low input samples.


Assuntos
Aprendizado Profundo , Peptídeos , Espectrometria de Massas em Tandem , Humanos , Peptídeos/química , Peptídeos/imunologia , Espectrometria de Massas em Tandem/métodos , Bases de Dados de Proteínas , Proteômica/métodos , Antígenos HLA/imunologia , Antígenos HLA/genética , Software , Íons
3.
J Chem Inf Model ; 64(7): 2515-2527, 2024 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-37870574

RESUMO

In the field of drug discovery, there is a substantial challenge in seeking out chemical structures that possess desirable pharmacological, toxicological, and pharmacokinetic properties. Complications arise when drugs interfere with the functioning of cardiac ion channels, leading to serious cardiovascular consequences. The discontinuation and removal of numerous approved drugs from the market or at late development stages in the pipeline due to such inhibitory effects further highlight the urgency of addressing this issue. Consequently, the early prediction of potential blockers targeting cardiac ion channels during the drug discovery process is of paramount importance. This study introduces a deep learning framework that computationally determines the cardiotoxicity associated with the voltage-gated potassium channel (hERG), the voltage-gated calcium channel (Cav1.2), and the voltage-gated sodium channel (Nav1.5) for drug candidates. The predictive capabilities of three feature representations─molecular fingerprints, descriptors, and graph-based numerical representations─are rigorously benchmarked. Additionally, a novel training and evaluation data set framework is presented, enabling predictive model training of drug off-target cardiotoxicity using a comprehensive and large curated data set covering these three cardiac ion channels. To facilitate these predictions, a robust and comprehensive small molecule cardiotoxicity prediction tool named CToxPred has been developed. It is made available as open source under the permissive MIT license at https://github.com/issararab/CToxPred.


Assuntos
Cardiotoxicidade , Canais de Potássio Éter-A-Go-Go , Humanos , Benchmarking , Canais Iônicos , Descoberta de Drogas , Bloqueadores dos Canais de Potássio/farmacologia , Bloqueadores dos Canais de Potássio/química
4.
Proteomics ; 24(8): e2300336, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38009585

RESUMO

Immunopeptidomics is a key technology in the discovery of targets for immunotherapy and vaccine development. However, identifying immunopeptides remains challenging due to their non-tryptic nature, which results in distinct spectral characteristics. Moreover, the absence of strict digestion rules leads to extensive search spaces, further amplified by the incorporation of somatic mutations, pathogen genomes, unannotated open reading frames, and post-translational modifications. This inflation in search space leads to an increase in random high-scoring matches, resulting in fewer identifications at a given false discovery rate. Peptide-spectrum match rescoring has emerged as a machine learning-based solution to address challenges in mass spectrometry-based immunopeptidomics data analysis. It involves post-processing unfiltered spectrum annotations to better distinguish between correct and incorrect peptide-spectrum matches. Recently, features based on predicted peptidoform properties, including fragment ion intensities, retention time, and collisional cross section, have been used to improve the accuracy and sensitivity of immunopeptide identification. In this review, we describe the diverse bioinformatics pipelines that are currently available for peptide-spectrum match rescoring and discuss how they can be used for the analysis of immunopeptidomics data. Finally, we provide insights into current and future machine learning solutions to boost immunopeptide identification.


Assuntos
Peptídeos , Proteômica , Proteômica/métodos , Peptídeos/química , Espectrometria de Massas/métodos , Aprendizado de Máquina , Processamento de Proteína Pós-Traducional
5.
Bioinformatics ; 39(7)2023 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-37369033

RESUMO

MOTIVATION: Driven by technological advances, the throughput and cost of mass spectrometry (MS) proteomics experiments have improved by orders of magnitude in recent decades. Spectral library searching is a common approach to annotating experimental mass spectra by matching them against large libraries of reference spectra corresponding to known peptides. An important disadvantage, however, is that only peptides included in the spectral library can be found, whereas novel peptides, such as those with unexpected post-translational modifications (PTMs), will remain unknown. Open modification searching (OMS) is an increasingly popular approach to annotate modified peptides based on partial matches against their unmodified counterparts. Unfortunately, this leads to very large search spaces and excessive runtimes, which is especially problematic considering the continuously increasing sizes of MS proteomics datasets. RESULTS: We propose an OMS algorithm, called HOMS-TC, that fully exploits parallelism in the entire pipeline of spectral library searching. We designed a new highly parallel encoding method based on the principle of hyperdimensional computing to encode mass spectral data to hypervectors while minimizing information loss. This process can be easily parallelized since each dimension is calculated independently. HOMS-TC processes two stages of existing cascade search in parallel and selects the most similar spectra while considering PTMs. We accelerate HOMS-TC on NVIDIA's tensor core units, which is emerging and readily available in the recent graphics processing unit (GPU). Our evaluation shows that HOMS-TC is 31× faster on average than alternative search engines and provides comparable accuracy to competing search tools. AVAILABILITY AND IMPLEMENTATION: HOMS-TC is freely available under the Apache 2.0 license as an open-source software project at https://github.com/tycheyoung/homs-tc.


Assuntos
Software , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Bases de Dados de Proteínas , Peptídeos/química , Ferramenta de Busca , Algoritmos , Biblioteca de Peptídeos
6.
J Proteome Res ; 22(6): 1639-1648, 2023 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-37166120

RESUMO

As current shotgun proteomics experiments can produce gigabytes of mass spectrometry data per hour, processing these massive data volumes has become progressively more challenging. Spectral clustering is an effective approach to speed up downstream data processing by merging highly similar spectra to minimize data redundancy. However, because state-of-the-art spectral clustering tools fail to achieve optimal runtimes, this simply moves the processing bottleneck. In this work, we present a fast spectral clustering tool, HyperSpec, based on hyperdimensional computing (HDC). HDC shows promising clustering capability while only requiring lightweight binary operations with high parallelism that can be optimized using low-level hardware architectures, making it possible to run HyperSpec on graphics processing units to achieve extremely efficient spectral clustering performance. Additionally, HyperSpec includes optimized data preprocessing modules to reduce the spectrum preprocessing time, which is a critical bottleneck during spectral clustering. Based on experiments using various mass spectrometry data sets, HyperSpec produces results with comparable clustering quality as state-of-the-art spectral clustering tools while achieving speedups by orders of magnitude, shortening the clustering runtime of over 21 million spectra from 4 h to only 24 min.


Assuntos
Algoritmos , Peptídeos , Peptídeos/análise , Espectrometria de Massas/métodos , Proteômica/métodos , Análise por Conglomerados
7.
J Proteome Res ; 22(2): 585-593, 2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36688569

RESUMO

A key analysis task in mass spectrometry proteomics is matching the acquired tandem mass spectra to their originating peptides by sequence database searching or spectral library searching. Machine learning is an increasingly popular postprocessing approach to maximize the number of confident spectrum identifications that can be obtained at a given false discovery rate threshold. Here, we have integrated semisupervised machine learning in the ANN-SoLo tool, an efficient spectral library search engine that is optimized for open modification searching to identify peptides with any type of post-translational modification. We show that machine learning rescoring boosts the number of spectra that can be identified for both standard searching and open searching, and we provide insights into relevant spectrum characteristics harnessed by the machine learning model. The semisupervised machine learning functionality has now been fully integrated into ANN-SoLo, which is available as open source under the permissive Apache 2.0 license on GitHub at https://github.com/bittremieux/ANN-SoLo.


Assuntos
Peptídeos , Software , Bases de Dados de Proteínas , Peptídeos/análise , Espectrometria de Massas em Tandem/métodos , Aprendizado de Máquina , Algoritmos , Biblioteca de Peptídeos
8.
J Am Soc Mass Spectrom ; 33(9): 1733-1744, 2022 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-35960544

RESUMO

Spectrum alignment of tandem mass spectrometry (MS/MS) data using the modified cosine similarity and subsequent visualization as molecular networks have been demonstrated to be a useful strategy to discover analogs of molecules from untargeted MS/MS-based metabolomics experiments. Recently, a neutral loss matching approach has been introduced as an alternative to MS/MS-based molecular networking with an implied performance advantage in finding analogs that cannot be discovered using existing MS/MS spectrum alignment strategies. To comprehensively evaluate the scoring properties of neutral loss matching, the cosine similarity, and the modified cosine similarity, similarity measures of 955 228 peptide MS/MS spectrum pairs and 10 million small molecule MS/MS spectrum pairs were compared. This comparative analysis revealed that the modified cosine similarity outperformed neutral loss matching and the cosine similarity in all cases. The data further indicated that the performance of MS/MS spectrum alignment depends on the location and type of the modification, as well as the chemical compound class of fragmented molecules.


Assuntos
Metabolômica , Espectrometria de Massas em Tandem , Metabolômica/métodos , Peptídeos , Espectrometria de Massas em Tandem/métodos
9.
Nat Methods ; 19(6): 675-678, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35637305

RESUMO

Computational methods that aim to exploit publicly available mass spectrometry repositories rely primarily on unsupervised clustering of spectra. Here we trained a deep neural network in a supervised fashion on the basis of previous assignments of peptides to spectra. The network, called 'GLEAMS', learns to embed spectra in a low-dimensional space in which spectra generated by the same peptide are close to one another. We applied GLEAMS for large-scale spectrum clustering, detecting groups of unidentified, proximal spectra representing the same peptide. We used these clusters to explore the dark proteome of repeatedly observed yet consistently unidentified mass spectra.


Assuntos
Peptídeos , Espectrometria de Massas em Tandem , Algoritmos , Análise por Conglomerados , Redes Neurais de Computação , Peptídeos/química , Proteoma/análise , Espectrometria de Massas em Tandem/métodos
10.
Nat Biotechnol ; 39(2): 169-173, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33169034

RESUMO

We engineered a machine learning approach, MSHub, to enable auto-deconvolution of gas chromatography-mass spectrometry (GC-MS) data. We then designed workflows to enable the community to store, process, share, annotate, compare and perform molecular networking of GC-MS data within the Global Natural Product Social (GNPS) Molecular Networking analysis platform. MSHub/GNPS performs auto-deconvolution of compound fragmentation patterns via unsupervised non-negative matrix factorization and quantifies the reproducibility of fragmentation patterns across samples.


Assuntos
Algoritmos , Cromatografia Gasosa-Espectrometria de Massas , Metabolômica , Animais , Anuros , Humanos
11.
Methods Mol Biol ; 2120: 183-195, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32124320

RESUMO

Recognition of cancer epitopes by T cells is fundamental for the activation of targeted antitumor responses. As such, the identification and study of epitope-specific T cells has been instrumental in our understanding of cancer immunology and the development of personalized immunotherapies. To facilitate the study of T-cell epitope specificity, we developed a prediction tool, TCRex, that can identify epitope-specific T-cell receptors (TCRs) directly from TCR repertoire data and perform epitope-specificity enrichment analyses. This chapter details the use of the TCRex web tool.


Assuntos
Epitopos de Linfócito T/imunologia , Receptores de Antígenos de Linfócitos T/imunologia , Linfócitos T/imunologia , Humanos , Aprendizado de Máquina , Modelos Imunológicos , Software , Especificidade do Receptor de Antígeno de Linfócitos T
12.
Front Immunol ; 10: 2820, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31849987

RESUMO

High-throughput T cell receptor (TCR) sequencing allows the characterization of an individual's TCR repertoire and directly queries their immune state. However, it remains a non-trivial task to couple these sequenced TCRs to their antigenic targets. In this paper, we present a novel strategy to annotate full TCR sequence repertoires with their epitope specificities. The strategy is based on a machine learning algorithm to learn the TCR patterns common to the recognition of a specific epitope. These results are then combined with a statistical analysis to evaluate the occurrence of specific epitope-reactive TCR sequences per epitope in repertoire data. In this manner, we can directly study the capacity of full TCR repertoires to target specific epitopes of the relevant vaccines or pathogens. We demonstrate the usability of this approach on three independent datasets related to vaccine monitoring and infectious disease diagnostics by independently identifying the epitopes that are targeted by the TCR repertoire. The developed method is freely available as a web tool for academic use at tcrex.biodatamining.be.


Assuntos
Epitopos de Linfócito T/imunologia , Modelos Biológicos , Receptores de Antígenos de Linfócitos T/genética , Especificidade do Receptor de Antígeno de Linfócitos T/genética , Especificidade do Receptor de Antígeno de Linfócitos T/imunologia , Linfócitos T/imunologia , Linfócitos T/metabolismo , Algoritmos , Sequência de Aminoácidos , Evolução Clonal/genética , Bases de Dados Genéticas , Epitopos de Linfócito T/química , Humanos , Receptores de Antígenos de Linfócitos T/metabolismo , Reprodutibilidade dos Testes , Software , Navegador
13.
J Proteome Res ; 18(11): 3936-3943, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31556620

RESUMO

For the 2018 YPIC Challenge, contestants were invited to try to decipher two unknown English questions encoded by a synthetic protein expressed in Escherichia coli. In addition to deciphering the sentence, contestants were asked to determine the three-dimensional structure and detect any post-translation modifications left by the host organism. We present our experimental and computational strategy to characterize this sample by identifying the unknown protein sequence and detecting the presence of post-translational modifications. The sample was acquired with dynamic exclusion disabled to increase the signal-to-noise ratio of the measured molecules, after which spectral clustering was used to generate high-quality consensus spectra. De novo spectrum identification was used to determine the synthetic protein sequence, and any post-translational modifications introduced by E. coli on the synthetic protein were analyzed via spectral networking. This workflow resulted in a de novo sequence coverage of 70%, on par with sequence database searching performance. Additionally, the spectral networking analysis indicated that no systematic modifications were introduced on the synthetic protein by E. coli. The strategy presented here can be directly used to analyze samples for which no protein sequence information is available or when the identity of the sample is unknown. All software and code to perform the bioinformatics analysis is available as open source, and self-contained Jupyter notebooks are provided to fully recreate the analysis.


Assuntos
Escherichia coli/metabolismo , Peptídeos/análise , Processamento de Proteína Pós-Traducional , Proteínas/análise , Proteômica/métodos , Sequência de Aminoácidos , Biologia Computacional/métodos , Escherichia coli/genética , Peptídeos/metabolismo , Biossíntese de Proteínas/genética , Proteínas/genética , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Software , Espectrometria de Massas em Tandem/métodos
14.
J Proteome Res ; 18(10): 3792-3799, 2019 10 04.
Artigo em Inglês | MEDLINE | ID: mdl-31448616

RESUMO

Open modification searching (OMS) is a powerful search strategy to identify peptides with any type of modification. OMS works by using a very wide precursor mass window to allow modified spectra to match against their unmodified variants, after which the modification types can be inferred from the corresponding precursor mass differences. A disadvantage of this strategy, however, is the large computational cost, because each query spectrum has to be compared against a multitude of candidate peptides. We have previously introduced the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. Here we demonstrate how this candidate selection procedure can be further optimized using graphics processing units. Additionally, we introduce a feature hashing scheme to convert high-resolution spectra to low-dimensional vectors. On the basis of these algorithmic advances, along with low-level code optimizations, the new version of ANN-SoLo is up to an order of magnitude faster than its initial version. This makes it possible to efficiently perform open searches on a large scale to gain a deeper understanding about the protein modification landscape. We demonstrate the computational efficiency and identification performance of ANN-SoLo based on a large data set of the draft human proteome. ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo .


Assuntos
Biblioteca de Peptídeos , Proteômica/métodos , Software , Bases de Dados de Proteínas , Humanos , Peptídeos/análise , Processamento de Proteína Pós-Traducional
15.
J Proteome Res ; 17(10): 3463-3474, 2018 10 05.
Artigo em Inglês | MEDLINE | ID: mdl-30184435

RESUMO

Open modification searching (OMS) is a powerful search strategy that identifies peptides carrying any type of modification by allowing a modified spectrum to match against its unmodified variant by using a very wide precursor mass window. A drawback of this strategy, however, is that it leads to a large increase in search time. Although performing an open search can be done using existing spectral library search engines by simply setting a wide precursor mass window, none of these tools have been optimized for OMS, leading to excessive runtimes and suboptimal identification results. We present the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. This approach is combined with a cascade search strategy to maximize the number of identified unmodified and modified spectra while strictly controlling the false discovery rate as well as a shifted dot product score to sensitively match modified spectra to their unmodified counterparts. ANN-SoLo achieves state-of-the-art performance in terms of speed and the number of identifications. On a previously published human cell line data set, ANN-SoLo confidently identifies more spectra than SpectraST or MSFragger and achieves a speedup of an order of magnitude compared with SpectraST. ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo .


Assuntos
Bases de Dados de Proteínas , Peptídeos/metabolismo , Proteômica/métodos , Ferramenta de Busca/métodos , Algoritmos , Biologia Computacional/métodos , Células HEK293 , Humanos , Biblioteca de Peptídeos , Peptídeos/química , Processamento de Proteína Pós-Traducional , Reprodutibilidade dos Testes , Software , Espectrometria de Massas em Tandem , Fatores de Tempo
16.
Immunogenetics ; 70(3): 159-168, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-28779185

RESUMO

Current T cell epitope prediction tools are a valuable resource in designing targeted immunogenicity experiments. They typically focus on, and are able to, accurately predict peptide binding and presentation by major histocompatibility complex (MHC) molecules on the surface of antigen-presenting cells. However, recognition of the peptide-MHC complex by a T cell receptor (TCR) is often not included in these tools. We developed a classification approach based on random forest classifiers to predict recognition of a peptide by a T cell receptor and discover patterns that contribute to recognition. We considered two approaches to solve this problem: (1) distinguishing between two sets of TCRs that each bind to a known peptide and (2) retrieving TCRs that bind to a given peptide from a large pool of TCRs. Evaluation of the models on two HIV-1, B*08-restricted epitopes reveals good performance and hints towards structural CDR3 features that can determine peptide immunogenicity. These results are of particular importance as they show that prediction of T cell epitope and T cell epitope recognition based on sequence data is a feasible approach. In addition, the validity of our models not only serves as a proof of concept for the prediction of immunogenic T cell epitopes but also paves the way for more general and high-performing models.


Assuntos
Epitopos de Linfócito T/imunologia , HIV-1/imunologia , Peptídeos/imunologia , Receptores de Antígenos de Linfócitos T/imunologia , Sequência de Aminoácidos/genética , Apresentação de Antígeno/imunologia , Células Apresentadoras de Antígenos/imunologia , Linfócitos T CD8-Positivos/imunologia , HIV-1/isolamento & purificação , Humanos , Complexo Principal de Histocompatibilidade/imunologia , Ligação Proteica/imunologia
17.
J Proteome Res ; 15(4): 1300-7, 2016 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-26974716

RESUMO

Despite many technological and computational advances, the results of a mass spectrometry proteomics experiment are still subject to a large variability. For the understanding and evaluation of how technical variability affects the results of an experiment, several computationally derived quality control metrics have been introduced. However, despite the availability of these metrics, a systematic approach to quality control is often still lacking because the metrics are not fully understood and are hard to interpret. Here, we present a toolkit of powerful techniques to analyze and interpret multivariate quality control metrics to assess the quality of mass spectrometry proteomics experiments. We show how unsupervised techniques applied to these quality control metrics can provide an initial discrimination between low-quality experiments and high-quality experiments prior to manual investigation. Furthermore, we provide a technique to obtain detailed information on the quality control metrics that are related to the decreased performance, which can be used as actionable information to improve the experimental setup. Our toolkit is released as open-source and can be downloaded from https://bitbucket.org/proteinspector/qc_analysis/ .


Assuntos
Proteínas de Bactérias/isolamento & purificação , Cromatografia Líquida/normas , Espectrometria de Massas/normas , Proteínas de Neoplasias/isolamento & purificação , Fragmentos de Peptídeos/análise , Proteômica/normas , Área Sob a Curva , Proteínas de Bactérias/química , Neoplasias Colorretais/química , Humanos , Proteínas de Neoplasias/química , Fragmentos de Peptídeos/química , Proteômica/métodos , Controle de Qualidade , Curva ROC , Shewanella/química , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA