Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Anal Chem ; 95(32): 11901-11907, 2023 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-37540774

RESUMEN

The inability to identify the structures of most metabolites detected in environmental or biological samples limits the utility of nontargeted metabolomics. The most widely used analytical approaches combine mass spectrometry and machine learning methods to rank candidate structures contained in large chemical databases. Given the large chemical space typically searched, the use of additional orthogonal data may improve the identification rates and reliability. Here, we present results of combining experimental and computational mass and IR spectral data for high-throughput nontargeted chemical structure identification. Experimental MS/MS and gas-phase IR data for 148 test compounds were obtained from NIST. Candidate structures for each of the test compounds were obtained from PubChem (mean = 4444 candidate structures per test compound). Our workflow used CSI:FingerID to initially score and rank the candidate structures. The top 1000 ranked candidates were subsequently used for IR spectra prediction, scoring, and ranking using density functional theory (DFT-IR). Final ranking of the candidates was based on a composite score calculated as the average of the CSI:FingerID and DFT-IR rankings. This approach resulted in the correct identification of 88 of the 148 test compounds (59%). 129 of the 148 test compounds (87%) were ranked within the top 20 candidates. These identification rates are the highest yet reported when candidate structures are used from PubChem. Combining experimental and computational MS/MS and IR spectral data is a potentially powerful option for prioritizing candidates for final structure verification.


Asunto(s)
Bases de Datos de Compuestos Químicos , Espectrometría de Masas en Tándem , Reproducibilidad de los Resultados , Metabolómica/métodos , Aprendizaje Automático
3.
Nat Microbiol ; 7(12): 2128-2150, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36443458

RESUMEN

Despite advances in sequencing, lack of standardization makes comparisons across studies challenging and hampers insights into the structure and function of microbial communities across multiple habitats on a planetary scale. Here we present a multi-omics analysis of a diverse set of 880 microbial community samples collected for the Earth Microbiome Project. We include amplicon (16S, 18S, ITS) and shotgun metagenomic sequence data, and untargeted metabolomics data (liquid chromatography-tandem mass spectrometry and gas chromatography mass spectrometry). We used standardized protocols and analytical methods to characterize microbial communities, focusing on relationships and co-occurrences of microbially related metabolites and microbial taxa across environments, thus allowing us to explore diversity at extraordinary scale. In addition to a reference database for metagenomic and metabolomic data, we provide a framework for incorporating additional studies, enabling the expansion of existing knowledge in the form of an evolving community resource. We demonstrate the utility of this database by testing the hypothesis that every microbe and metabolite is everywhere but the environment selects. Our results show that metabolite diversity exhibits turnover and nestedness related to both microbial communities and the environment, whereas the relative abundances of microbially related metabolites vary and co-occur with specific microbial consortia in a habitat-specific manner. We additionally show the power of certain chemistry, in particular terpenoids, in distinguishing Earth's environments (for example, terrestrial plant surfaces and soils, freshwater and marine animal stool), as well as that of certain microbes including Conexibacter woesei (terrestrial soils), Haloquadratum walsbyi (marine deposits) and Pantoea dispersa (terrestrial plant detritus). This Resource provides insight into the taxa and metabolites within microbial communities from diverse habitats across Earth, informing both microbial and chemical ecology, and provides a foundation and methods for multi-omics microbiome studies of hosts and the environment.


Asunto(s)
Microbiota , Animales , Microbiota/genética , Metagenoma , Metagenómica , Planeta Tierra , Suelo
4.
Environ Microbiol ; 24(11): 5408-5424, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36222155

RESUMEN

The exchange of metabolites mediates algal and bacterial interactions that maintain ecosystem function. Yet, while thousands of metabolites are produced, only a few molecules have been identified in these associations. Using the ubiquitous microalgae Pseudo-nitzschia sp., as a model, we employed an untargeted metabolomics strategy to assign structural characteristics to the metabolites that distinguished specific diatom-microbiome associations. We cultured five species of Pseudo-nitzschia, including two species that produced the toxin domoic acid, and examined their microbiomes and metabolomes. A total of 4826 molecular features were detected by tandem mass spectrometry. Only 229 of these could be annotated using available mass spectral libraries, but by applying new in silico annotation tools, characterization was expanded to 2710 features. The metabolomes of the Pseudo-nitzschia-microbiome associations were distinct and distinguished by structurally diverse nitrogen compounds, ranging from simple amines and amides to cyclic compounds such as imidazoles, pyrrolidines and lactams. By illuminating the dark metabolomes, this study expands our capacity to discover new chemical targets that facilitate microbial partnerships and uncovers the chemical diversity that underpins algae-bacteria interactions.


Asunto(s)
Diatomeas , Microbiota , Diatomeas/metabolismo , Espectrometría de Masas en Tándem , Metaboloma
5.
Environ Sci Technol ; 56(15): 11027-11040, 2022 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-35834352

RESUMEN

Ultrahigh-resolution Fourier transform mass spectrometry (FTMS) has revealed unprecedented details of natural complex mixtures such as dissolved organic matter (DOM) on a molecular formula level, but we lack approaches to access the underlying structural complexity. We here explore the hypothesis that every DOM precursor ion is potentially linked with all emerging product ions in FTMS2 experiments. The resulting mass difference (Δm) matrix is deconvoluted to isolate individual precursor ion Δm profiles and matched with structural information, which was derived from 42 Δm features from 14 in-house reference compounds and a global set of 11 477 Δm features with assigned structure specificities, using a dataset of ∼18 000 unique structures. We show that Δm matching is highly sensitive in predicting potential precursor ion identities in terms of molecular and structural composition. Additionally, the approach identified unresolved precursor ions and missing elements in molecular formula annotation (P, Cl, F). Our study provides first results on how Δm matching refines structural annotations in van Krevelen space but simultaneously demonstrates the wide overlap between potential structural classes. We show that this effect is likely driven by chemodiversity and offers an explanation for the observed ubiquitous presence of molecules in the center of the van Krevelen space. Our promising first results suggest that Δm matching can both unfold the structural information encrypted in DOM and assess the quality of FTMS-derived molecular formulas of complex mixtures in general.


Asunto(s)
Materia Orgánica Disuelta , Espectrometría de Masa por Ionización de Electrospray , Mezclas Complejas , Estructura Molecular , Espectrometría de Masa por Ionización de Electrospray/métodos
6.
Bioinformatics ; 38(Suppl 1): i342-i349, 2022 06 24.
Artículo en Inglés | MEDLINE | ID: mdl-35758813

RESUMEN

MOTIVATION: Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but these libraries are vastly incomplete; in silico methods search in structure databases, allowing us to overcome this limitation. The best-performing in silico methods use machine learning to predict a molecular fingerprint from tandem mass spectra, then use the predicted fingerprint to search in a molecular structure database. Predicted molecular fingerprints are also of great interest for compound class annotation, de novo structure elucidation, and other tasks. So far, kernel support vector machines are the best tool for fingerprint prediction. However, they cannot be trained on all publicly available reference spectra because their training time scales cubically with the number of training data. RESULTS: We use the Nyström approximation to transform the kernel into a linear feature map. We evaluate two methods that use this feature map as input: a linear support vector machine and a deep neural network (DNN). For evaluation, we use a cross-validated dataset of 156 017 compounds and three independent datasets with 1734 compounds. We show that the combination of kernel method and DNN outperforms the kernel support vector machine, which is the current gold standard, as well as a DNN on tandem mass spectra on all evaluation datasets. AVAILABILITY AND IMPLEMENTATION: The deep kernel learning method for fingerprint prediction is part of the SIRIUS software, available at https://bio.informatik.uni-jena.de/software/sirius.


Asunto(s)
Metabolómica , Espectrometría de Masas en Tándem , Bases de Datos de Compuestos Químicos , Aprendizaje Automático , Metabolómica/métodos , Redes Neurales de la Computación , Espectrometría de Masas en Tándem/métodos
7.
Nat Methods ; 19(7): 865-870, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35637304

RESUMEN

Current methods for structure elucidation of small molecules rely on finding similarity with spectra of known compounds, but do not predict structures de novo for unknown compound classes. We present MSNovelist, which combines fingerprint prediction with an encoder-decoder neural network to generate structures de novo solely from tandem mass spectrometry (MS2) spectra. In an evaluation with 3,863 MS2 spectra from the Global Natural Product Social Molecular Networking site, MSNovelist predicted 25% of structures correctly on first rank, retrieved 45% of structures overall and reproduced 61% of correct database annotations, without having ever seen the structure in the training phase. Similarly, for the CASMI 2016 challenge, MSNovelist correctly predicted 26% and retrieved 57% of structures, recovering 64% of correct database annotations. Finally, we illustrate the application of MSNovelist in a bryophyte MS2 dataset, in which de novo structure prediction substantially outscored the best database candidate for seven spectra. MSNovelist is ideally suited to complement library-based annotation in the case of poorly represented analyte classes and novel compounds.


Asunto(s)
Espectrometría de Masas en Tándem , Bases de Datos Factuales
8.
Nat Biotechnol ; 40(3): 411-421, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-34650271

RESUMEN

Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but, typically, only a small fraction of spectra can be matched. Previous in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. Here we introduce the COSMIC workflow that combines in silico structure database generation and annotation with a confidence score consisting of kernel density P value estimation and a support vector machine with enforced directionality of features. On diverse datasets, COSMIC annotates a substantial number of hits at low false discovery rates and outperforms spectral library search. To demonstrate that COSMIC can annotate structures never reported before, we annotated 12 natural bile acids. The annotation of nine structures was confirmed by manual evaluation and two structures using synthetic standards. In human samples, we annotated and manually validated 315 molecular structures currently absent from the Human Metabolome Database. Application of COSMIC to data from 17,400 metabolomics experiments led to 1,715 high-confidence structural annotations that were absent from spectral libraries.


Asunto(s)
Metabolómica , Espectrometría de Masas en Tándem , Bases de Datos Factuales , Humanos , Metaboloma , Metabolómica/métodos , Estructura Molecular
9.
Nat Commun ; 12(1): 3832, 2021 06 22.
Artículo en Inglés | MEDLINE | ID: mdl-34158495

RESUMEN

Molecular networking connects mass spectra of molecules based on the similarity of their fragmentation patterns. However, during ionization, molecules commonly form multiple ion species with different fragmentation behavior. As a result, the fragmentation spectra of these ion species often remain unconnected in tandem mass spectrometry-based molecular networks, leading to redundant and disconnected sub-networks of the same compound classes. To overcome this bottleneck, we develop Ion Identity Molecular Networking (IIMN) that integrates chromatographic peak shape correlation analysis into molecular networks to connect and collapse different ion species of the same molecule. The new feature relationships improve network connectivity for structurally related molecules, can be used to reveal unknown ion-ligand complexes, enhance annotation within molecular networks, and facilitate the expansion of spectral reference libraries. IIMN is integrated into various open source feature finding tools and the GNPS environment. Moreover, IIMN-based spectral libraries with a broad coverage of ion species are publicly available.


Asunto(s)
Biología Computacional/métodos , Iones/metabolismo , Espectrometría de Masas/métodos , Redes y Vías Metabólicas , Metabolómica/métodos , Animales , Internet , Iones/química , Estructura Molecular , Reproducibilidad de los Resultados , Programas Informáticos
10.
Nat Chem Biol ; 17(2): 146-151, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33199911

RESUMEN

Untargeted mass spectrometry is employed to detect small molecules in complex biospecimens, generating data that are difficult to interpret. We developed Qemistree, a data exploration strategy based on the hierarchical organization of molecular fingerprints predicted from fragmentation spectra. Qemistree allows mass spectrometry data to be represented in the context of sample metadata and chemical ontologies. By expressing molecular relationships as a tree, we can apply ecological tools that are designed to analyze and visualize the relatedness of DNA sequences to metabolomics data. Here we demonstrate the use of tree-guided data exploration tools to compare metabolomics samples across different experimental conditions such as chromatographic shifts. Additionally, we leverage a tree representation to visualize chemical diversity in a heterogeneous collection of samples. The Qemistree software pipeline is freely available to the microbiome and metabolomics communities in the form of a QIIME2 plugin, and a global natural products social molecular networking workflow.


Asunto(s)
Espectrometría de Masas/métodos , Metabolómica , Algoritmos , Análisis por Conglomerados , ADN/química , Dermatoglifia del ADN , Bases de Datos Factuales , Ecología , Análisis de los Alimentos , Microbiota , Análisis Multivariante , Programas Informáticos , Espectrometría de Masas en Tándem , Flujo de Trabajo
11.
J Am Soc Mass Spectrom ; 32(1): 180-186, 2021 Jan 06.
Artículo en Inglés | MEDLINE | ID: mdl-33186010

RESUMEN

Interpretation of fragmentation mass spectra depends on our knowledge of collision-induced dissociation mechanisms. Computational methods for the annotation of fragmentation mechanisms operate within the boundaries of recognized fragmentation pathways. The prevalence of charge migration fragmentation (CMF) in sodiated ion fragmentation spectra, which produces nonsodiated fragment ions, is unknown. Here, we investigated the extent of CMF in the fragmentation spectra of sodiated precursors by mining the NIST17 spectral library using a diagnostic mass difference. Our results showed that a substantial amount of fragment ions in sodiated precursor spectra are derived from CMF, indicating that this fragmentation mechanism should be commonly considered by computational methods for compound annotation.

12.
Nat Biotechnol ; 39(4): 462-471, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33230292

RESUMEN

Metabolomics using nontargeted tandem mass spectrometry can detect thousands of molecules in a biological sample. However, structural molecule annotation is limited to structures present in libraries or databases, restricting analysis and interpretation of experimental data. Here we describe CANOPUS (class assignment and ontology prediction using mass spectrometry), a computational tool for systematic compound class annotation. CANOPUS uses a deep neural network to predict 2,497 compound classes from fragmentation spectra, including all biologically relevant classes. CANOPUS explicitly targets compounds for which neither spectral nor structural reference data are available and predicts classes lacking tandem mass spectrometry training data. In evaluation using reference data, CANOPUS reached very high prediction performance (average accuracy of 99.7% in cross-validation) and outperformed four baseline methods. We demonstrate the broad utility of CANOPUS by investigating the effect of microbial colonization in the mouse digestive system, through analysis of the chemodiversity of different Euphorbia plants and regarding the discovery of a marine natural product, revealing biological insights at the compound class level.


Asunto(s)
Organismos Acuáticos/química , Productos Biológicos/análisis , Biología Computacional/métodos , Euphorbia/química , Metabolómica/métodos , Animales , Cromatografía Liquida , Microbioma Gastrointestinal , Ratones , Redes Neurales de la Computación , Espectrometría de Masas en Tándem
13.
Nat Methods ; 17(9): 905-908, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32839597

RESUMEN

Molecular networking has become a key method to visualize and annotate the chemical space in non-targeted mass spectrometry data. We present feature-based molecular networking (FBMN) as an analysis method in the Global Natural Products Social Molecular Networking (GNPS) infrastructure that builds on chromatographic feature detection and alignment tools. FBMN enables quantitative analysis and resolution of isomers, including from ion mobility spectrometry.


Asunto(s)
Productos Biológicos/química , Espectrometría de Masas , Biología Computacional/métodos , Bases de Datos Factuales , Metabolómica/métodos , Programas Informáticos
14.
Methods Mol Biol ; 2104: 185-207, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-31953819

RESUMEN

SIRIUS 4 is the best-in-class computational tool for metabolite identification from high-resolution tandem mass spectrometry data. It offers de novo molecular formula annotation with outstanding accuracy. When searching fragmentation spectra in a structure database, it reaches over 70% correct identifications. A predicted fingerprint, which indicates the presence or absence of thousands of molecular properties, helps to deduce information about the compound of interest even if it is not contained in any structure database. Here, we present best practices and describe how to leverage the full potential of SIRIUS 4, how to incorporate it into your own workflow, and how it adds value to the analysis of mass spectrometry data beyond spectral library search.


Asunto(s)
Biología Computacional , Bases de Datos Factuales , Metabolómica , Programas Informáticos , Cromatografía Liquida , Biología Computacional/métodos , Humanos , Metabolómica/métodos , Estructura Molecular , Espectrometría de Masa por Ionización de Electrospray , Relación Estructura-Actividad , Espectrometría de Masas en Tándem , Interfaz Usuario-Computador , Flujo de Trabajo
15.
Nat Methods ; 16(4): 299-302, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30886413

RESUMEN

Mass spectrometry is a predominant experimental technique in metabolomics and related fields, but metabolite structural elucidation remains highly challenging. We report SIRIUS 4 (https://bio.informatik.uni-jena.de/sirius/), which provides a fast computational approach for molecular structure identification. SIRIUS 4 integrates CSI:FingerID for searching in molecular structure databases. Using SIRIUS 4, we achieved identification rates of more than 70% on challenging metabolomics datasets.


Asunto(s)
Metabolómica/métodos , Estructura Molecular , Procesamiento de Señales Asistido por Computador , Espectrometría de Masas en Tándem/métodos , Algoritmos , Teorema de Bayes , Biomarcadores , Análisis por Conglomerados , Biología Computacional/métodos , Gráficos por Computador , Bases de Datos Factuales , Procesamiento Automatizado de Datos , Internet , Isótopos , Funciones de Verosimilitud , Metaboloma , Redes Neurales de la Computación , Lenguajes de Programación , Interfaz Usuario-Computador
16.
Bioinformatics ; 34(13): i333-i340, 2018 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-29949965

RESUMEN

Motivation: Metabolites, small molecules that are involved in cellular reactions, provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem mass spectrometry to identify the thousands of compounds in a biological sample. Recently, we presented CSI:FingerID for searching in molecular structure databases using tandem mass spectrometry data. CSI:FingerID predicts a molecular fingerprint that encodes the structure of the query compound, then uses this to search a molecular structure database such as PubChem. Scoring of the predicted query fingerprint and deterministic target fingerprints is carried out assuming independence between the molecular properties constituting the fingerprint. Results: We present a scoring that takes into account dependencies between molecular properties. As before, we predict posterior probabilities of molecular properties using machine learning. Dependencies between molecular properties are modeled as a Bayesian tree network; the tree structure is estimated on the fly from the instance data. For each edge, we also estimate the expected covariance between the two random variables. For fixed marginal probabilities, we then estimate conditional probabilities using the known covariance. Now, the corrected posterior probability of each candidate can be computed, and candidates are ranked by this score. Modeling dependencies improves identification rates of CSI:FingerID by 2.85 percentage points. Availability and implementation: The new scoring Bayesian (fixed tree) is integrated into SIRIUS 4.0 (https://bio.informatik.uni-jena.de/software/sirius/).


Asunto(s)
Bases de Datos de Compuestos Químicos , Metabolómica , Espectrometría de Masas en Tándem , Teorema de Bayes , Aprendizaje Automático , Metabolómica/métodos , Programas Informáticos
17.
Int J Mol Sci ; 19(5)2018 May 06.
Artículo en Inglés | MEDLINE | ID: mdl-29734799

RESUMEN

The relatively new research discipline of Eco-Metabolomics is the application of metabolomics techniques to ecology with the aim to characterise biochemical interactions of organisms across different spatial and temporal scales. Metabolomics is an untargeted biochemical approach to measure many thousands of metabolites in different species, including plants and animals. Changes in metabolite concentrations can provide mechanistic evidence for biochemical processes that are relevant at ecological scales. These include physiological, phenotypic and morphological responses of plants and communities to environmental changes and also interactions with other organisms. Traditionally, research in biochemistry and ecology comes from two different directions and is performed at distinct spatiotemporal scales. Biochemical studies most often focus on intrinsic processes in individuals at physiological and cellular scales. Generally, they take a bottom-up approach scaling up cellular processes from spatiotemporally fine to coarser scales. Ecological studies usually focus on extrinsic processes acting upon organisms at population and community scales and typically study top-down and bottom-up processes in combination. Eco-Metabolomics is a transdisciplinary research discipline that links biochemistry and ecology and connects the distinct spatiotemporal scales. In this review, we focus on approaches to study chemical and biochemical interactions of plants at various ecological levels, mainly plant⁻organismal interactions, and discuss related examples from other domains. We present recent developments and highlight advancements in Eco-Metabolomics over the last decade from various angles. We further address the five key challenges: (1) complex experimental designs and large variation of metabolite profiles; (2) feature extraction; (3) metabolite identification; (4) statistical analyses; and (5) bioinformatics software tools and workflows. The presented solutions to these challenges will advance connecting the distinct spatiotemporal scales and bridging biochemistry and ecology.


Asunto(s)
Ecología , Metabolómica/tendencias , Plantas/genética , Plantas/metabolismo
18.
Nat Commun ; 8(1): 1494, 2017 11 14.
Artículo en Inglés | MEDLINE | ID: mdl-29133785

RESUMEN

The annotation of small molecules in untargeted mass spectrometry relies on the matching of fragment spectra to reference library spectra. While various spectrum-spectrum match scores exist, the field lacks statistical methods for estimating the false discovery rates (FDR) of these annotations. We present empirical Bayes and target-decoy based methods to estimate the false discovery rate (FDR) for 70 public metabolomics data sets. We show that the spectral matching settings need to be adjusted for each project. By adjusting the scoring parameters and thresholds, the number of annotations rose, on average, by +139% (ranging from -92 up to +5705%) when compared with a default parameter set available at GNPS. The FDR estimation methods presented will enable a user to assess the scoring criteria for large scale analysis of mass spectrometry based metabolomics data that has been essential in the advancement of proteomics, transcriptomics, and genomics science.


Asunto(s)
Metabolómica , Espectrometría de Masas en Tándem/métodos , Algoritmos , Cromatografía Liquida , Biología Computacional/métodos , Bases de Datos de Proteínas
19.
J Cheminform ; 9(1): 22, 2017 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-29086042

RESUMEN

BACKGROUND: The fourth round of the Critical Assessment of Small Molecule Identification (CASMI) Contest ( www.casmi-contest.org ) was held in 2016, with two new categories for automated methods. This article covers the 208 challenges in Categories 2 and 3, without and with metadata, from organization, participation, results and post-contest evaluation of CASMI 2016 through to perspectives for future contests and small molecule annotation/identification. RESULTS: The Input Output Kernel Regression (CSI:IOKR) machine learning approach performed best in "Category 2: Best Automatic Structural Identification-In Silico Fragmentation Only", won by Team Brouard with 41% challenge wins. The winner of "Category 3: Best Automatic Structural Identification-Full Information" was Team Kind (MS-FINDER), with 76% challenge wins. The best methods were able to achieve over 30% Top 1 ranks in Category 2, with all methods ranking the correct candidate in the Top 10 in around 50% of challenges. This success rate rose to 70% Top 1 ranks in Category 3, with candidates in the Top 10 in over 80% of the challenges. The machine learning and chemistry-based approaches are shown to perform in complementary ways. CONCLUSIONS: The improvement in (semi-)automated fragmentation methods for small molecule identification has been substantial. The achieved high rates of correct candidates in the Top 1 and Top 10, despite large candidate numbers, open up great possibilities for high-throughput annotation of untargeted analysis for "known unknowns". As more high quality training data becomes available, the improvements in machine learning methods will likely continue, but the alternative approaches still provide valuable complementary information. Improved integration of experimental context will also improve identification success further for "real life" annotations. The true "unknown unknowns" remain to be evaluated in future CASMI contests. Graphical abstract .

20.
Bioinformatics ; 32(12): i28-i36, 2016 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-27307628

RESUMEN

MOTIVATION: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. RESULTS: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. CONTACT: celine.brouard@aalto.fi SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Automático , Metabolómica , Estructura Molecular , Espectrometría de Masas en Tándem , Algoritmos , Bases de Datos de Compuestos Químicos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...