Pesquisa | BVS - MINISTÉRIO DA SAÚDE

TransExION: a transformer based explainable similarity metric for comparing IONS in tandem mass spectrometry.

Bui-Thi, Danh; Liu, Youzhong; Lippens, Jennifer L; Laukens, Kris; De Vijlder, Thomas.

J Cheminform ; 16(1): 61, 2024 May 28.

Artigo em Inglês | MEDLINE | ID: mdl-38807166

RESUMO

Small molecule identification is a crucial task in analytical chemistry and life sciences. One of the most commonly used technologies to elucidate small molecule structures is mass spectrometry. Spectral library search of product ion spectra (MS/MS) is a popular strategy to identify or find structural analogues. This approach relies on the assumption that spectral similarity and structural similarity are correlated. However, popular spectral similarity measures, usually calculated based on identical fragment matches between the MS/MS spectra, do not always accurately reflect the structural similarity. In this study, we propose TransExION, a Transformer based Explainable similarity metric for IONS. TransExION detects related fragments between MS/MS spectra through their mass difference and uses these to estimate spectral similarity. These related fragments can be nearly identical, but can also share a substructure. TransExION also provides a post-hoc explanation of its estimation, which can be used to support scientists in evaluating the spectral library search results and thus in structure elucidation of unknown molecules. Our model has a Transformer based architecture and it is trained on the data derived from GNPS MS/MS libraries. The experimental results show that it improves existing spectral similarity measures in searching and interpreting structural analogues as well as in molecular networking. SCIENTIFIC CONTRIBUTION: We propose a transformer-based spectral similarity metrics that improves the comparison of small molecule tandem mass spectra. We provide a post hoc explanation that can serve as a good starting point for unknown spectra annotation based on database spectra.

MeRgeION: a Multifunctional R Pipeline for Small Molecule LC-MS/MS Data Processing, Searching, and Organizing.

Liu, Youzhong; Zhang, Yingjie; Vennekens, Tom; Lippens, Jennifer L; Duijsens, Luc; Bui-Thi, Danh; Laukens, Kris; de Vijlder, Thomas.

Anal Chem ; 95(22): 8433-8442, 2023 06 06.

Artigo em Inglês | MEDLINE | ID: mdl-37218737

RESUMO

Small molecule structure elucidation using tandem mass spectrometry (MS/MS) plays a crucial role in life science, bioanalytical, and pharmaceutical research. There is a pressing need for increased throughput of compound identification and transformation of historical data into information-rich spectral databases. Meanwhile, molecular networking, a recent bioinformatic framework, provides global displays and system-level understanding of complex LC-MS/MS data sets. Herein we present meRgeION, a multifunctional, modular, and flexible R-based toolbox to streamline spectral database building, automated structural elucidation, and molecular networking. The toolbox offers diverse tuning parameters and the possibility to combine various algorithms in the same pipeline. As an open-source R package, meRgeION is ideally suited for building spectral databases and molecular networks from privacy-sensitive and preliminary data. Using meRgeION, we have created an integrated spectral database covering diverse pharmaceutical compounds that was successfully applied to annotate drug-related metabolites from a published nontargeted metabolomics data set as well as reveal the chemical space behind this complex data set through molecular networking. Moreover, the meRgeION-based processing workflow has demonstrated the usefulness of a spectral library search and molecular networking for pharmaceutical forced degradation studies. meRgeION is freely available at: https://github.com/daniellyz/meRgeION2.

Assuntos

Algoritmos , Espectrometria de Massas em Tandem , Cromatografia Líquida/métodos , Metabolômica/métodos , Preparações Farmacêuticas , Software

Predicting compound-protein interaction using hierarchical graph convolutional networks.

Bui-Thi, Danh; Rivière, Emmanuel; Meysman, Pieter; Laukens, Kris.

PLoS One ; 17(7): e0258628, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35862351

RESUMO

MOTIVATION: Convolutional neural networks have enabled unprecedented breakthroughs in a variety of computer vision tasks. They have also drawn much attention from other domains, including drug discovery and drug development. In this study, we develop a computational method based on convolutional neural networks to tackle a fundamental question in drug discovery and development, i.e. the prediction of compound-protein interactions based on compound structure and protein sequence. We propose a hierarchical graph convolutional network (HGCN) to encode small molecules. The HGCN aggregates a molecule embedding from substructure embeddings, which are synthesized from atom embeddings. As small molecules usually share substructures, computing a molecule embedding from those common substructures allows us to learn better generic models. We then combined the HGCN with a one-dimensional convolutional network to construct a complete model for predicting compound-protein interactions. Furthermore we apply an explanation technique, Grad-CAM, to visualize the contribution of each amino acid into the prediction. RESULTS: Experiments using different datasets show the improvement of our model compared to other GCN-based methods and a sequence based method, DeepDTA, in predicting compound-protein interactions. Each prediction made by the model is also explainable and can be used to identify critical residues mediating the interaction.

Assuntos

Descoberta de Drogas , Redes Neurais de Computação , Sequência de Aminoácidos

On the viability of unsupervised T-cell receptor sequence clustering for epitope preference.

Meysman, Pieter; De Neuter, Nicolas; Gielis, Sofie; Bui Thi, Danh; Ogunjimi, Benson; Laukens, Kris.

Bioinformatics ; 35(9): 1461-1468, 2019 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-30247624

RESUMO

MOTIVATION: The T-cell receptor (TCR) is responsible for recognizing epitopes presented on cell surfaces. Linking TCR sequences to their ability to target specific epitopes is currently an unsolved problem, yet one of great interest. Indeed, it is currently unknown how dissimilar TCR sequences can be before they no longer bind the same epitope. This question is confounded by the fact that there are many ways to define the similarity between two TCR sequences. Here we investigate both issues in the context of TCR sequence unsupervised clustering. RESULTS: We provide an overview of the performance of various distance metrics on two large independent datasets with 412 and 2835 TCR sequences respectively. Our results confirm the presence of structural distinct TCR groups that target identical epitopes. In addition, we put forward several recommendations to perform unsupervised T-cell receptor sequence clustering. AVAILABILITY AND IMPLEMENTATION: Source code implemented in Python 3 available at https://github.com/pmeysman/TCRclusteringPaper. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Receptores de Antígenos de Linfócitos T/imunologia , Software , Análise por Conglomerados , Epitopos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA