Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 90
Filter
2.
Anal Chem ; 95(32): 11901-11907, 2023 08 15.
Article in English | MEDLINE | ID: mdl-37540774

ABSTRACT

The inability to identify the structures of most metabolites detected in environmental or biological samples limits the utility of nontargeted metabolomics. The most widely used analytical approaches combine mass spectrometry and machine learning methods to rank candidate structures contained in large chemical databases. Given the large chemical space typically searched, the use of additional orthogonal data may improve the identification rates and reliability. Here, we present results of combining experimental and computational mass and IR spectral data for high-throughput nontargeted chemical structure identification. Experimental MS/MS and gas-phase IR data for 148 test compounds were obtained from NIST. Candidate structures for each of the test compounds were obtained from PubChem (mean = 4444 candidate structures per test compound). Our workflow used CSI:FingerID to initially score and rank the candidate structures. The top 1000 ranked candidates were subsequently used for IR spectra prediction, scoring, and ranking using density functional theory (DFT-IR). Final ranking of the candidates was based on a composite score calculated as the average of the CSI:FingerID and DFT-IR rankings. This approach resulted in the correct identification of 88 of the 148 test compounds (59%). 129 of the 148 test compounds (87%) were ranked within the top 20 candidates. These identification rates are the highest yet reported when candidate structures are used from PubChem. Combining experimental and computational MS/MS and IR spectral data is a potentially powerful option for prioritizing candidates for final structure verification.


Subject(s)
Databases, Chemical , Tandem Mass Spectrometry , Reproducibility of Results , Metabolomics/methods , Machine Learning
4.
Metabolites ; 13(3)2023 Feb 21.
Article in English | MEDLINE | ID: mdl-36984753

ABSTRACT

Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics usually relies on mass spectrometry, a technology capable of detecting thousands of compounds in a biological sample. Metabolite annotation is executed using tandem mass spectrometry. Spectral library search is far from comprehensive, and numerous compounds remain unannotated. So-called in silico methods allow us to overcome the restrictions of spectral libraries, by searching in much larger molecular structure databases. Yet, after more than a decade of method development, in silico methods still do not reach the correct annotation rates that users would wish for. Here, we present a novel computational method called Mad Hatter for this task. Mad Hatter combines CSI:FingerID results with information from the searched structure database via a metascore. Compound information includes the melting point, and the number of words in the compound description starting with the letter 'u'. We then show that Mad Hatter reaches a stunning 97.6% correct annotations when searching PubChem, one of the largest and most comprehensive molecular structure databases. Unfortunately, Mad Hatter is not a real method. Rather, we developed Mad Hatter solely for the purpose of demonstrating common issues in computational method development and evaluation. We explain what evaluation glitches were necessary for Mad Hatter to reach this annotation level, what is wrong with similar metascores in general, and why metascores may screw up not only method evaluations but also the analysis of biological experiments. This paper may serve as an example of problems in the development and evaluation of machine learning models for metabolite annotation.

5.
Nat Microbiol ; 7(12): 2128-2150, 2022 12.
Article in English | MEDLINE | ID: mdl-36443458

ABSTRACT

Despite advances in sequencing, lack of standardization makes comparisons across studies challenging and hampers insights into the structure and function of microbial communities across multiple habitats on a planetary scale. Here we present a multi-omics analysis of a diverse set of 880 microbial community samples collected for the Earth Microbiome Project. We include amplicon (16S, 18S, ITS) and shotgun metagenomic sequence data, and untargeted metabolomics data (liquid chromatography-tandem mass spectrometry and gas chromatography mass spectrometry). We used standardized protocols and analytical methods to characterize microbial communities, focusing on relationships and co-occurrences of microbially related metabolites and microbial taxa across environments, thus allowing us to explore diversity at extraordinary scale. In addition to a reference database for metagenomic and metabolomic data, we provide a framework for incorporating additional studies, enabling the expansion of existing knowledge in the form of an evolving community resource. We demonstrate the utility of this database by testing the hypothesis that every microbe and metabolite is everywhere but the environment selects. Our results show that metabolite diversity exhibits turnover and nestedness related to both microbial communities and the environment, whereas the relative abundances of microbially related metabolites vary and co-occur with specific microbial consortia in a habitat-specific manner. We additionally show the power of certain chemistry, in particular terpenoids, in distinguishing Earth's environments (for example, terrestrial plant surfaces and soils, freshwater and marine animal stool), as well as that of certain microbes including Conexibacter woesei (terrestrial soils), Haloquadratum walsbyi (marine deposits) and Pantoea dispersa (terrestrial plant detritus). This Resource provides insight into the taxa and metabolites within microbial communities from diverse habitats across Earth, informing both microbial and chemical ecology, and provides a foundation and methods for multi-omics microbiome studies of hosts and the environment.


Subject(s)
Microbiota , Animals , Microbiota/genetics , Metagenome , Metagenomics , Earth, Planet , Soil
6.
Metabolomics ; 18(12): 97, 2022 11 27.
Article in English | MEDLINE | ID: mdl-36436113

ABSTRACT

INTRODUCTION: The structural identification of metabolites represents one of the current bottlenecks in non-targeted liquid chromatography-mass spectrometry (LC-MS) based metabolomics. The Metabolomics Standard Initiative has developed a multilevel system to report confidence in metabolite identification, which involves the use of MS, MS/MS and orthogonal data. Limitations due to similar or same fragmentation pattern (e.g. isomeric compounds) can be overcome by the additional orthogonal information of the retention time (RT), since it is a system property that is different for each chromatographic setup. OBJECTIVES: In contrast to MS data, sharing of RT data is not as widespread. The quality of data and its (re-)useability depend very much on the quality of the metadata. We aimed to evaluate the coverage and quality of this metadata from public metabolomics repositories. METHODS: We acquired an overview on the current reporting of chromatographic separation conditions. For this purpose, we defined the following information as important details that have to be provided: column name and dimension, flow rate, temperature, composition of eluents and gradient. RESULTS: We found that 70% of descriptions of the chromatographic setups are incomplete (according to our definition) and an additional 10% of the descriptions contained ambiguous and/or incorrect information. Accordingly, only about 20% of the descriptions allow further (re-)use of the data, e.g. for RT prediction. Therefore, we have started to develop a unified and standardized notation for chromatographic metadata with detailed and specific description of eluents, columns and gradients. CONCLUSION: Reporting of chromatographic metadata is currently not unified. Our recommended suggestions for metadata reporting will enable more standardization and automatization in future reporting.


Subject(s)
Metabolomics , Metadata , Tandem Mass Spectrometry , Chromatography, Liquid , Temperature
7.
Environ Microbiol ; 24(11): 5408-5424, 2022 11.
Article in English | MEDLINE | ID: mdl-36222155

ABSTRACT

The exchange of metabolites mediates algal and bacterial interactions that maintain ecosystem function. Yet, while thousands of metabolites are produced, only a few molecules have been identified in these associations. Using the ubiquitous microalgae Pseudo-nitzschia sp., as a model, we employed an untargeted metabolomics strategy to assign structural characteristics to the metabolites that distinguished specific diatom-microbiome associations. We cultured five species of Pseudo-nitzschia, including two species that produced the toxin domoic acid, and examined their microbiomes and metabolomes. A total of 4826 molecular features were detected by tandem mass spectrometry. Only 229 of these could be annotated using available mass spectral libraries, but by applying new in silico annotation tools, characterization was expanded to 2710 features. The metabolomes of the Pseudo-nitzschia-microbiome associations were distinct and distinguished by structurally diverse nitrogen compounds, ranging from simple amines and amides to cyclic compounds such as imidazoles, pyrrolidines and lactams. By illuminating the dark metabolomes, this study expands our capacity to discover new chemical targets that facilitate microbial partnerships and uncovers the chemical diversity that underpins algae-bacteria interactions.


Subject(s)
Diatoms , Microbiota , Diatoms/metabolism , Tandem Mass Spectrometry , Metabolome
8.
Proc Natl Acad Sci U S A ; 119(35): e2122636119, 2022 08 30.
Article in English | MEDLINE | ID: mdl-36018838

ABSTRACT

Taxonomic classification, that is, the assignment to biological clades with shared ancestry, is a common task in genetics, mainly based on a genome similarity search of large genome databases. The classification quality depends heavily on the database, since representative relatives must be present. Many genomic sequences cannot be classified at all or only with a high misclassification rate. Here we present BERTax, a deep neural network program based on natural language processing to precisely classify the superkingdom and phylum of DNA sequences taxonomically without the need for a known representative relative from a database. We show BERTax to be at least on par with the state-of-the-art approaches when taxonomically similar species are part of the training data. For novel organisms, however, BERTax clearly outperforms any existing approach. Finally, we show that BERTax can also be combined with database approaches to further increase the prediction quality in almost all cases. Since BERTax is not based on similar entries in databases, it allows precise taxonomic classification of a broader range of genomic sequences, thus increasing the overall information gain.


Subject(s)
DNA Barcoding, Taxonomic , DNA , Deep Learning , Software , Algorithms , Base Sequence , DNA/classification , DNA/genetics , DNA Barcoding, Taxonomic/methods , Genome , Genomics
9.
Environ Sci Technol ; 56(15): 11027-11040, 2022 08 02.
Article in English | MEDLINE | ID: mdl-35834352

ABSTRACT

Ultrahigh-resolution Fourier transform mass spectrometry (FTMS) has revealed unprecedented details of natural complex mixtures such as dissolved organic matter (DOM) on a molecular formula level, but we lack approaches to access the underlying structural complexity. We here explore the hypothesis that every DOM precursor ion is potentially linked with all emerging product ions in FTMS2 experiments. The resulting mass difference (Δm) matrix is deconvoluted to isolate individual precursor ion Δm profiles and matched with structural information, which was derived from 42 Δm features from 14 in-house reference compounds and a global set of 11 477 Δm features with assigned structure specificities, using a dataset of ∼18 000 unique structures. We show that Δm matching is highly sensitive in predicting potential precursor ion identities in terms of molecular and structural composition. Additionally, the approach identified unresolved precursor ions and missing elements in molecular formula annotation (P, Cl, F). Our study provides first results on how Δm matching refines structural annotations in van Krevelen space but simultaneously demonstrates the wide overlap between potential structural classes. We show that this effect is likely driven by chemodiversity and offers an explanation for the observed ubiquitous presence of molecules in the center of the van Krevelen space. Our promising first results suggest that Δm matching can both unfold the structural information encrypted in DOM and assess the quality of FTMS-derived molecular formulas of complex mixtures in general.


Subject(s)
Dissolved Organic Matter , Spectrometry, Mass, Electrospray Ionization , Complex Mixtures , Molecular Structure , Spectrometry, Mass, Electrospray Ionization/methods
10.
Nat Methods ; 19(7): 865-870, 2022 07.
Article in English | MEDLINE | ID: mdl-35637304

ABSTRACT

Current methods for structure elucidation of small molecules rely on finding similarity with spectra of known compounds, but do not predict structures de novo for unknown compound classes. We present MSNovelist, which combines fingerprint prediction with an encoder-decoder neural network to generate structures de novo solely from tandem mass spectrometry (MS2) spectra. In an evaluation with 3,863 MS2 spectra from the Global Natural Product Social Molecular Networking site, MSNovelist predicted 25% of structures correctly on first rank, retrieved 45% of structures overall and reproduced 61% of correct database annotations, without having ever seen the structure in the training phase. Similarly, for the CASMI 2016 challenge, MSNovelist correctly predicted 26% and retrieved 57% of structures, recovering 64% of correct database annotations. Finally, we illustrate the application of MSNovelist in a bryophyte MS2 dataset, in which de novo structure prediction substantially outscored the best database candidate for seven spectra. MSNovelist is ideally suited to complement library-based annotation in the case of poorly represented analyte classes and novel compounds.


Subject(s)
Tandem Mass Spectrometry , Databases, Factual
11.
J Proteome Res ; 21(4): 1204-1207, 2022 04 01.
Article in English | MEDLINE | ID: mdl-35119864

ABSTRACT

Machine learning is increasingly applied in proteomics and metabolomics to predict molecular structure, function, and physicochemical properties, including behavior in chromatography, ion mobility, and tandem mass spectrometry. These must be described in sufficient detail to apply or evaluate the performance of trained models. Here we look at and interpret the recently published and general DOME (Data, Optimization, Model, Evaluation) recommendations for conducting and reporting on machine learning in the specific context of proteomics and metabolomics.


Subject(s)
Metabolomics , Proteomics , Machine Learning , Metabolomics/methods , Proteomics/methods , Tandem Mass Spectrometry
12.
Article in English | MEDLINE | ID: mdl-34879285

ABSTRACT

Metabolomics deals with the large-scale analysis of metabolites, belonging to numerous compound classes and showing an extremely high chemical diversity and complexity. Lipidomics, being a subcategory of metabolomics, analyzes the cellular lipid species. Both require state-of-the-art analytical methods capable of accessing the underlying chemical complexity. One of the major techniques used for the analysis of metabolites and lipids is Liquid Chromatography-Mass Spectrometry (LC-MS), offering both different selectivities in LC separation and high sensitivity in MS detection. Chromatography can be divided into different modes, based on the properties of the employed separation system. The most popular ones are Reversed-Phase (RP) separation for non- to mid-polar molecules and Hydrophilic Interaction Liquid Chromatography (HILIC) for polar molecules. So far, no single analysis method exists that can cover the entire range of metabolites or lipids, due to the huge chemical diversity. Consequently, different separation methods have been used for different applications and research questions. In this review, we explore the current use of LC-MS in metabolomics and lipidomics. As a proxy, we examined the use of chromatographic methods in the public repositories EBI MetaboLights and NIH Metabolomics Workbench. We extracted 1484 method descriptions, collected separation metadata and generated an overview on the current use of columns, eluents, etc. Based on this overview, we reviewed current practices and identified potential future trends as well as required improvements that may allow us to increase metabolite coverage, throughput or both simultaneously.


Subject(s)
Chromatography, Liquid , Mass Spectrometry , Metabolomics , Animals , Chromatography, Liquid/methods , Chromatography, Liquid/trends , Escherichia coli , Humans , Lipidomics/methods , Lipidomics/trends , Mass Spectrometry/methods , Mass Spectrometry/trends , Metabolomics/methods , Metabolomics/trends , Mice
13.
Nat Biotechnol ; 40(3): 411-421, 2022 03.
Article in English | MEDLINE | ID: mdl-34650271

ABSTRACT

Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but, typically, only a small fraction of spectra can be matched. Previous in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. Here we introduce the COSMIC workflow that combines in silico structure database generation and annotation with a confidence score consisting of kernel density P value estimation and a support vector machine with enforced directionality of features. On diverse datasets, COSMIC annotates a substantial number of hits at low false discovery rates and outperforms spectral library search. To demonstrate that COSMIC can annotate structures never reported before, we annotated 12 natural bile acids. The annotation of nine structures was confirmed by manual evaluation and two structures using synthetic standards. In human samples, we annotated and manually validated 315 molecular structures currently absent from the Human Metabolome Database. Application of COSMIC to data from 17,400 metabolomics experiments led to 1,715 high-confidence structural annotations that were absent from spectral libraries.


Subject(s)
Metabolomics , Tandem Mass Spectrometry , Databases, Factual , Humans , Metabolome , Metabolomics/methods , Molecular Structure
14.
Nat Commun ; 12(1): 3832, 2021 06 22.
Article in English | MEDLINE | ID: mdl-34158495

ABSTRACT

Molecular networking connects mass spectra of molecules based on the similarity of their fragmentation patterns. However, during ionization, molecules commonly form multiple ion species with different fragmentation behavior. As a result, the fragmentation spectra of these ion species often remain unconnected in tandem mass spectrometry-based molecular networks, leading to redundant and disconnected sub-networks of the same compound classes. To overcome this bottleneck, we develop Ion Identity Molecular Networking (IIMN) that integrates chromatographic peak shape correlation analysis into molecular networks to connect and collapse different ion species of the same molecule. The new feature relationships improve network connectivity for structurally related molecules, can be used to reveal unknown ion-ligand complexes, enhance annotation within molecular networks, and facilitate the expansion of spectral reference libraries. IIMN is integrated into various open source feature finding tools and the GNPS environment. Moreover, IIMN-based spectral libraries with a broad coverage of ion species are publicly available.


Subject(s)
Computational Biology/methods , Ions/metabolism , Mass Spectrometry/methods , Metabolic Networks and Pathways , Metabolomics/methods , Animals , Internet , Ions/chemistry , Molecular Structure , Reproducibility of Results , Software
15.
Nat Chem Biol ; 17(2): 146-151, 2021 02.
Article in English | MEDLINE | ID: mdl-33199911

ABSTRACT

Untargeted mass spectrometry is employed to detect small molecules in complex biospecimens, generating data that are difficult to interpret. We developed Qemistree, a data exploration strategy based on the hierarchical organization of molecular fingerprints predicted from fragmentation spectra. Qemistree allows mass spectrometry data to be represented in the context of sample metadata and chemical ontologies. By expressing molecular relationships as a tree, we can apply ecological tools that are designed to analyze and visualize the relatedness of DNA sequences to metabolomics data. Here we demonstrate the use of tree-guided data exploration tools to compare metabolomics samples across different experimental conditions such as chromatographic shifts. Additionally, we leverage a tree representation to visualize chemical diversity in a heterogeneous collection of samples. The Qemistree software pipeline is freely available to the microbiome and metabolomics communities in the form of a QIIME2 plugin, and a global natural products social molecular networking workflow.


Subject(s)
Mass Spectrometry/methods , Metabolomics , Algorithms , Cluster Analysis , DNA/chemistry , DNA Fingerprinting , Databases, Factual , Ecology , Food Analysis , Microbiota , Multivariate Analysis , Software , Tandem Mass Spectrometry , Workflow
16.
J Am Soc Mass Spectrom ; 32(1): 180-186, 2021 Jan 06.
Article in English | MEDLINE | ID: mdl-33186010

ABSTRACT

Interpretation of fragmentation mass spectra depends on our knowledge of collision-induced dissociation mechanisms. Computational methods for the annotation of fragmentation mechanisms operate within the boundaries of recognized fragmentation pathways. The prevalence of charge migration fragmentation (CMF) in sodiated ion fragmentation spectra, which produces nonsodiated fragment ions, is unknown. Here, we investigated the extent of CMF in the fragmentation spectra of sodiated precursors by mining the NIST17 spectral library using a diagnostic mass difference. Our results showed that a substantial amount of fragment ions in sodiated precursor spectra are derived from CMF, indicating that this fragmentation mechanism should be commonly considered by computational methods for compound annotation.

17.
Nat Biotechnol ; 39(4): 462-471, 2021 04.
Article in English | MEDLINE | ID: mdl-33230292

ABSTRACT

Metabolomics using nontargeted tandem mass spectrometry can detect thousands of molecules in a biological sample. However, structural molecule annotation is limited to structures present in libraries or databases, restricting analysis and interpretation of experimental data. Here we describe CANOPUS (class assignment and ontology prediction using mass spectrometry), a computational tool for systematic compound class annotation. CANOPUS uses a deep neural network to predict 2,497 compound classes from fragmentation spectra, including all biologically relevant classes. CANOPUS explicitly targets compounds for which neither spectral nor structural reference data are available and predicts classes lacking tandem mass spectrometry training data. In evaluation using reference data, CANOPUS reached very high prediction performance (average accuracy of 99.7% in cross-validation) and outperformed four baseline methods. We demonstrate the broad utility of CANOPUS by investigating the effect of microbial colonization in the mouse digestive system, through analysis of the chemodiversity of different Euphorbia plants and regarding the discovery of a marine natural product, revealing biological insights at the compound class level.


Subject(s)
Aquatic Organisms/chemistry , Biological Products/analysis , Computational Biology/methods , Euphorbia/chemistry , Metabolomics/methods , Animals , Chromatography, Liquid , Gastrointestinal Microbiome , Mice , Neural Networks, Computer , Tandem Mass Spectrometry
18.
J Sep Sci ; 43(9-10): 1746-1754, 2020 May.
Article in English | MEDLINE | ID: mdl-32144942

ABSTRACT

Metabolite identification is a crucial step in nontargeted metabolomics, but also represents one of its current bottlenecks. Accurate identifications are required for correct biological interpretation. To date, annotation and identification are usually based on the use of accurate mass search or tandem mass spectrometry analysis, but neglect orthogonal information such as retention times obtained by chromatographic separation. While several tools are available for the analysis and prediction of tandem mass spectrometry data, prediction of retention times for metabolite identification are not widespread. Here, we review the current state of retention time prediction in liquid chromatography-mass spectrometry-based metabolomics, with a focus on publications published after 2010.

19.
Methods Mol Biol ; 2104: 185-207, 2020.
Article in English | MEDLINE | ID: mdl-31953819

ABSTRACT

SIRIUS 4 is the best-in-class computational tool for metabolite identification from high-resolution tandem mass spectrometry data. It offers de novo molecular formula annotation with outstanding accuracy. When searching fragmentation spectra in a structure database, it reaches over 70% correct identifications. A predicted fingerprint, which indicates the presence or absence of thousands of molecular properties, helps to deduce information about the compound of interest even if it is not contained in any structure database. Here, we present best practices and describe how to leverage the full potential of SIRIUS 4, how to incorporate it into your own workflow, and how it adds value to the analysis of mass spectrometry data beyond spectral library search.


Subject(s)
Computational Biology , Databases, Factual , Metabolomics , Software , Chromatography, Liquid , Computational Biology/methods , Humans , Metabolomics/methods , Molecular Structure , Spectrometry, Mass, Electrospray Ionization , Structure-Activity Relationship , Tandem Mass Spectrometry , User-Computer Interface , Workflow
20.
Nat Methods ; 16(4): 299-302, 2019 04.
Article in English | MEDLINE | ID: mdl-30886413

ABSTRACT

Mass spectrometry is a predominant experimental technique in metabolomics and related fields, but metabolite structural elucidation remains highly challenging. We report SIRIUS 4 (https://bio.informatik.uni-jena.de/sirius/), which provides a fast computational approach for molecular structure identification. SIRIUS 4 integrates CSI:FingerID for searching in molecular structure databases. Using SIRIUS 4, we achieved identification rates of more than 70% on challenging metabolomics datasets.


Subject(s)
Metabolomics/methods , Molecular Structure , Signal Processing, Computer-Assisted , Tandem Mass Spectrometry/methods , Algorithms , Bayes Theorem , Biomarkers , Cluster Analysis , Computational Biology/methods , Computer Graphics , Databases, Factual , Electronic Data Processing , Internet , Isotopes , Likelihood Functions , Metabolome , Neural Networks, Computer , Programming Languages , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...