Pesquisa | Biblioteca Virtual em Saúde

Semisupervised Learning to Boost hERG, Nav1.5, and Cav1.2 Cardiac Ion Channel Toxicity Prediction by Mining a Large Unlabeled Small Molecule Data Set.

Arab, Issar; Laukens, Kris; Bittremieux, Wout.

J Chem Inf Model ; 2024 Aug 07.

Artigo em Inglês | MEDLINE | ID: mdl-39110924

RESUMO

Predicting drug toxicity is a critical aspect of ensuring patient safety during the drug design process. Although conventional machine learning techniques have shown some success in this field, the scarcity of annotated toxicity data poses a significant challenge in enhancing models' performance. In this study, we explore the potential of leveraging large unlabeled small molecule data sets using semisupervised learning to improve drug cardiotoxicity predictive performance across three cardiac ion channel targets: the voltage-gated potassium channel (hERG), the voltage-gated sodium channel (Nav1.5), and the voltage-gated calcium channel (Cav1.2). We extensively mined the ChEMBL database, comprising approximately 2 million small molecules, and then employed semisupervised learning to construct robust classification models for this purpose. We achieved a performance boost on highly diverse (i.e., structurally dissimilar) test data sets across all three targets. Using our built models, we screened the whole ChEMBL database and a large set of FDA-approved drugs, identifying several compounds with potential cardiac ion channel activity. To ensure broad accessibility and usability for both technical and nontechnical users, we developed a cross-platform graphical user interface that allows users to make predictions and gain insights into the cardiotoxicity of drugs and other small molecules. The software is made available as open source under the permissive MIT license at https://github.com/issararab/CToxPred2.

Sequence-to-sequence translation from mass spectra to peptides with a transformer model.

Yilmaz, Melih; Fondrie, William E; Bittremieux, Wout; Melendez, Carlo F; Nelson, Rowan; Ananth, Varun; Oh, Sewoong; Noble, William Stafford.

Nat Commun ; 15(1): 6427, 2024 Jul 30.

Artigo em Inglês | MEDLINE | ID: mdl-39080256

RESUMO

A fundamental challenge in mass spectrometry-based proteomics is the identification of the peptide that generated each acquired tandem mass spectrum. Approaches that leverage known peptide sequence databases cannot detect unexpected peptides and can be impractical or impossible to apply in some settings. Thus, the ability to assign peptide sequences to tandem mass spectra without prior information-de novo peptide sequencing-is valuable for tasks including antibody sequencing, immunopeptidomics, and metaproteomics. Although many methods have been developed to address this problem, it remains an outstanding challenge in part due to the difficulty of modeling the irregular data structure of tandem mass spectra. Here, we describe Casanovo, a machine learning model that uses a transformer neural network architecture to translate the sequence of peaks in a tandem mass spectrum into the sequence of amino acids that comprise the generating peptide. We train a Casanovo model from 30 million labeled spectra and demonstrate that the model outperforms several state-of-the-art methods on a cross-species benchmark dataset. We also develop a version of Casanovo that is fine-tuned for non-enzymatic peptides. Finally, we demonstrate that Casanovo's superior performance improves the analysis of immunopeptidomics and metaproteomics experiments and allows us to delve deeper into the dark proteome.

Assuntos

Peptídeos , Proteômica , Espectrometria de Massas em Tandem , Peptídeos/química , Peptídeos/metabolismo , Espectrometria de Massas em Tandem/métodos , Proteômica/métodos , Redes Neurais de Computação , Aprendizado de Máquina , Humanos , Sequência de Aminoácidos , Análise de Sequência de Proteína/métodos , Bases de Dados de Proteínas , Algoritmos

Machine Learning Strategies to Tackle Data Challenges in Mass Spectrometry-Based Proteomics.

Dens, Ceder; Adams, Charlotte; Laukens, Kris; Bittremieux, Wout.

J Am Soc Mass Spectrom ; 2024 Jul 29.

Artigo em Inglês | MEDLINE | ID: mdl-39074335

RESUMO

In computational proteomics, machine learning (ML) has emerged as a vital tool for enhancing data analysis. Despite significant advancements, the diversity of ML model architectures and the complexity of proteomics data present substantial challenges in the effective development and evaluation of these tools. Here, we highlight the necessity for high-quality, comprehensive data sets to train ML models and advocate for the standardization of data to support robust model development. We emphasize the instrumental role of key data sets like ProteomeTools and MassIVE-KB in advancing ML applications in proteomics and discuss the implications of data set size on model performance, highlighting that larger data sets typically yield more accurate models. To address data scarcity, we explore algorithmic strategies such as self-supervised pretraining and multitask learning. Ultimately, we hope that this discussion can serve as a call to action for the proteomics community to collaborate on data standardization and collection efforts, which are crucial for the sustainable advancement and refinement of ML methodologies in the field.

Communicating Mass Spectrometry Quality Information in mzQC with Python, R, and Java.

Bielow, Chris; Hoffmann, Nils; Jimenez-Morales, David; Van Den Bossche, Tim; Vizcaíno, Juan Antonio; Tabb, David L; Bittremieux, Wout; Walzer, Mathias.

J Am Soc Mass Spectrom ; 35(8): 1875-1882, 2024 Aug 07.

Artigo em Inglês | MEDLINE | ID: mdl-38918936

RESUMO

Mass spectrometry is a powerful technique for analyzing molecules in complex biological samples. However, inter- and intralaboratory variability and bias can affect the data due to various factors, including sample handling and preparation, instrument calibration and performance, and data acquisition and processing. To address this issue, the Quality Control (QC) working group of the Human Proteome Organization's Proteomics Standards Initiative has established the standard mzQC file format for reporting and exchanging information relating to data quality. mzQC is based on the JavaScript Object Notation (JSON) format and provides a lightweight yet versatile file format that can be easily implemented in software. Here, we present open-source software libraries to process mzQC data in three programming languages: Python, using pymzqc; R, using rmzqc; and Java, using jmzqc. The libraries follow a common data model and provide shared functionalities, including the (de)serialization and validation of mzQC files. We demonstrate use of the software libraries in a workflow for extracting, analyzing, and visualizing QC metrics from different sources. Additionally, we show how these libraries can be integrated with each other, with existing software tools, and in automated workflows for the QC of mass spectrometry data. All software libraries are available as open source under the MS-Quality-Hub organization on GitHub (https://github.com/MS-Quality-Hub).

Assuntos

Espectrometria de Massas , Linguagens de Programação , Proteômica , Controle de Qualidade , Software , Espectrometria de Massas/métodos , Espectrometria de Massas/normas , Humanos , Proteômica/métodos , Proteômica/normas , Fluxo de Trabalho

Fragment ion intensity prediction improves the identification rate of non-tryptic peptides in timsTOF.

Adams, Charlotte; Gabriel, Wassim; Laukens, Kris; Picciani, Mario; Wilhelm, Mathias; Bittremieux, Wout; Boonen, Kurt.

Nat Commun ; 15(1): 3956, 2024 May 10.

Artigo em Inglês | MEDLINE | ID: mdl-38730277

RESUMO

Immunopeptidomics is crucial for immunotherapy and vaccine development. Because the generation of immunopeptides from their parent proteins does not adhere to clear-cut rules, rather than being able to use known digestion patterns, every possible protein subsequence within human leukocyte antigen (HLA) class-specific length restrictions needs to be considered during sequence database searching. This leads to an inflation of the search space and results in lower spectrum annotation rates. Peptide-spectrum match (PSM) rescoring is a powerful enhancement of standard searching that boosts the spectrum annotation performance. We analyze 302,105 unique synthesized non-tryptic peptides from the ProteomeTools project on a timsTOF-Pro to generate a ground-truth dataset containing 93,227 MS/MS spectra of 74,847 unique peptides, that is used to fine-tune the deep learning-based fragment ion intensity prediction model Prosit. We demonstrate up to 3-fold improvement in the identification of immunopeptides, as well as increased detection of immunopeptides from low input samples.

Assuntos

Aprendizado Profundo , Peptídeos , Espectrometria de Massas em Tandem , Humanos , Peptídeos/química , Peptídeos/imunologia , Espectrometria de Massas em Tandem/métodos , Bases de Dados de Proteínas , Proteômica/métodos , Antígenos HLA/imunologia , Antígenos HLA/genética , Software , Íons

From data to discovery: The essential role of computational tools in proteomics.

Bittremieux, Wout.

Proteomics ; 24(8): e2300081, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38629976

Assuntos

Biologia Computacional , Proteômica

The underappreciated diversity of bile acid modifications.

Mohanty, Ipsita; Mannochio-Russo, Helena; Schweer, Joshua V; El Abiead, Yasin; Bittremieux, Wout; Xing, Shipei; Schmid, Robin; Zuffa, Simone; Vasquez, Felipe; Muti, Valentina B; Zemlin, Jasmine; Tovar-Herrera, Omar E; Moraïs, Sarah; Desai, Dhimant; Amin, Shantu; Koo, Imhoi; Turck, Christoph W; Mizrahi, Itzhak; Kris-Etherton, Penny M; Petersen, Kristina S; Fleming, Jennifer A; Huan, Tao; Patterson, Andrew D; Siegel, Dionicio; Hagey, Lee R; Wang, Mingxun; Aron, Allegra T; Dorrestein, Pieter C.

Cell ; 187(7): 1801-1818.e20, 2024 Mar 28.

Artigo em Inglês | MEDLINE | ID: mdl-38471500

RESUMO

The repertoire of modifications to bile acids and related steroidal lipids by host and microbial metabolism remains incompletely characterized. To address this knowledge gap, we created a reusable resource of tandem mass spectrometry (MS/MS) spectra by filtering 1.2 billion publicly available MS/MS spectra for bile-acid-selective ion patterns. Thousands of modifications are distributed throughout animal and human bodies as well as microbial cultures. We employed this MS/MS library to identify polyamine bile amidates, prevalent in carnivores. They are present in humans, and their levels alter with a diet change from a Mediterranean to a typical American diet. This work highlights the existence of many more bile acid modifications than previously recognized and the value of leveraging public large-scale untargeted metabolomics data to discover metabolites. The availability of a modification-centric bile acid MS/MS library will inform future studies investigating bile acid roles in health and disease.

Assuntos

Ácidos e Sais Biliares , Microbioma Gastrointestinal , Metabolômica , Espectrometria de Massas em Tandem , Animais , Humanos , Ácidos e Sais Biliares/química , Metabolômica/métodos , Poliaminas , Espectrometria de Massas em Tandem/métodos , Bases de Dados de Compostos Químicos

Molecular structure discovery for untargeted metabolomics using biotransformation rules and global molecular networking.

Martin, Margaret R; Bittremieux, Wout; Hassoun, Soha.

bioRxiv ; 2024 Feb 08.

Artigo em Inglês | MEDLINE | ID: mdl-38370723

RESUMO

Although untargeted mass spectrometry-based metabolomics is crucial for understanding life's molecular underpinnings, its effectiveness is hampered by low annotation rates of the generated tandem mass spectra. To address this issue, we introduce a novel data-driven approach, Biotransformation-based Annotation Method (BAM), that leverages molecular structural similarities inherent in biochemical reactions. BAM operates by applying biotransformation rules to known 'anchor' molecules, which exhibit high spectral similarity to unknown spectra, thereby hypothesizing and ranking potential structures for the corresponding 'suspect' molecule. BAM's effectiveness is demonstrated by its success in annotating suspect spectra in a global molecular network comprising hundreds of millions of spectra. BAM was able to assign correct molecular structures to 24.2 % of examined anchor-suspect cases, thereby demonstrating remarkable advancement in metabolite annotation.

Open access repository-scale propagated nearest neighbor suspect spectral library for untargeted metabolomics.

Bittremieux, Wout; Avalon, Nicole E; Thomas, Sydney P; Kakhkhorov, Sarvar A; Aksenov, Alexander A; Gomes, Paulo Wender P; Aceves, Christine M; Caraballo-Rodríguez, Andrés Mauricio; Gauglitz, Julia M; Gerwick, William H; Huan, Tao; Jarmusch, Alan K; Kaddurah-Daouk, Rima F; Kang, Kyo Bin; Kim, Hyun Woo; Kondic, Todor; Mannochio-Russo, Helena; Meehan, Michael J; Melnik, Alexey V; Nothias, Louis-Felix; O'Donovan, Claire; Panitchpakdi, Morgan; Petras, Daniel; Schmid, Robin; Schymanski, Emma L; van der Hooft, Justin J J; Weldon, Kelly C; Yang, Heejung; Xing, Shipei; Zemlin, Jasmine; Wang, Mingxun; Dorrestein, Pieter C.

Nat Commun ; 14(1): 8488, 2023 Dec 20.

Artigo em Inglês | MEDLINE | ID: mdl-38123557

RESUMO

Despite the increasing availability of tandem mass spectrometry (MS/MS) community spectral libraries for untargeted metabolomics over the past decade, the majority of acquired MS/MS spectra remain uninterpreted. To further aid in interpreting unannotated spectra, we created a nearest neighbor suspect spectral library, consisting of 87,916 annotated MS/MS spectra derived from hundreds of millions of MS/MS spectra originating from published untargeted metabolomics experiments. Entries in this library, or "suspects," were derived from unannotated spectra that could be linked in a molecular network to an annotated spectrum. Annotations were propagated to unknowns based on structural relationships to reference molecules using MS/MS-based spectrum alignment. We demonstrate the broad relevance of the nearest neighbor suspect spectral library through representative examples of propagation-based annotation of acylcarnitines, bacterial and plant natural products, and drug metabolism. Our results also highlight how the library can help to better understand an Alzheimer's brain phenotype. The nearest neighbor suspect spectral library is openly available for download or for data analysis through the GNPS platform to help investigators hypothesize candidate structures for unknown MS/MS spectra in untargeted metabolomics data.

Assuntos

Acesso à Informação , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Metabolômica/métodos , Biblioteca Gênica , Análise por Conglomerados

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA