RESUMEN
Small molecule structure elucidation using tandem mass spectrometry (MS/MS) plays a crucial role in life science, bioanalytical, and pharmaceutical research. There is a pressing need for increased throughput of compound identification and transformation of historical data into information-rich spectral databases. Meanwhile, molecular networking, a recent bioinformatic framework, provides global displays and system-level understanding of complex LC-MS/MS data sets. Herein we present meRgeION, a multifunctional, modular, and flexible R-based toolbox to streamline spectral database building, automated structural elucidation, and molecular networking. The toolbox offers diverse tuning parameters and the possibility to combine various algorithms in the same pipeline. As an open-source R package, meRgeION is ideally suited for building spectral databases and molecular networks from privacy-sensitive and preliminary data. Using meRgeION, we have created an integrated spectral database covering diverse pharmaceutical compounds that was successfully applied to annotate drug-related metabolites from a published nontargeted metabolomics data set as well as reveal the chemical space behind this complex data set through molecular networking. Moreover, the meRgeION-based processing workflow has demonstrated the usefulness of a spectral library search and molecular networking for pharmaceutical forced degradation studies. meRgeION is freely available at: https://github.com/daniellyz/meRgeION2.
Asunto(s)
Algoritmos , Espectrometría de Masas en Tándem , Cromatografía Liquida/métodos , Metabolómica/métodos , Preparaciones Farmacéuticas , Programas InformáticosRESUMEN
MOTIVATION: The T-cell receptor (TCR) is responsible for recognizing epitopes presented on cell surfaces. Linking TCR sequences to their ability to target specific epitopes is currently an unsolved problem, yet one of great interest. Indeed, it is currently unknown how dissimilar TCR sequences can be before they no longer bind the same epitope. This question is confounded by the fact that there are many ways to define the similarity between two TCR sequences. Here we investigate both issues in the context of TCR sequence unsupervised clustering. RESULTS: We provide an overview of the performance of various distance metrics on two large independent datasets with 412 and 2835 TCR sequences respectively. Our results confirm the presence of structural distinct TCR groups that target identical epitopes. In addition, we put forward several recommendations to perform unsupervised T-cell receptor sequence clustering. AVAILABILITY AND IMPLEMENTATION: Source code implemented in Python 3 available at https://github.com/pmeysman/TCRclusteringPaper. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Receptores de Antígenos de Linfocitos T/inmunología , Programas Informáticos , Análisis por Conglomerados , EpítoposRESUMEN
Small molecule identification is a crucial task in analytical chemistry and life sciences. One of the most commonly used technologies to elucidate small molecule structures is mass spectrometry. Spectral library search of product ion spectra (MS/MS) is a popular strategy to identify or find structural analogues. This approach relies on the assumption that spectral similarity and structural similarity are correlated. However, popular spectral similarity measures, usually calculated based on identical fragment matches between the MS/MS spectra, do not always accurately reflect the structural similarity. In this study, we propose TransExION, a Transformer based Explainable similarity metric for IONS. TransExION detects related fragments between MS/MS spectra through their mass difference and uses these to estimate spectral similarity. These related fragments can be nearly identical, but can also share a substructure. TransExION also provides a post-hoc explanation of its estimation, which can be used to support scientists in evaluating the spectral library search results and thus in structure elucidation of unknown molecules. Our model has a Transformer based architecture and it is trained on the data derived from GNPS MS/MS libraries. The experimental results show that it improves existing spectral similarity measures in searching and interpreting structural analogues as well as in molecular networking. SCIENTIFIC CONTRIBUTION: We propose a transformer-based spectral similarity metrics that improves the comparison of small molecule tandem mass spectra. We provide a post hoc explanation that can serve as a good starting point for unknown spectra annotation based on database spectra.
RESUMEN
MOTIVATION: Convolutional neural networks have enabled unprecedented breakthroughs in a variety of computer vision tasks. They have also drawn much attention from other domains, including drug discovery and drug development. In this study, we develop a computational method based on convolutional neural networks to tackle a fundamental question in drug discovery and development, i.e. the prediction of compound-protein interactions based on compound structure and protein sequence. We propose a hierarchical graph convolutional network (HGCN) to encode small molecules. The HGCN aggregates a molecule embedding from substructure embeddings, which are synthesized from atom embeddings. As small molecules usually share substructures, computing a molecule embedding from those common substructures allows us to learn better generic models. We then combined the HGCN with a one-dimensional convolutional network to construct a complete model for predicting compound-protein interactions. Furthermore we apply an explanation technique, Grad-CAM, to visualize the contribution of each amino acid into the prediction. RESULTS: Experiments using different datasets show the improvement of our model compared to other GCN-based methods and a sequence based method, DeepDTA, in predicting compound-protein interactions. Each prediction made by the model is also explainable and can be used to identify critical residues mediating the interaction.