Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters











Database
Language
Publication year range
2.
Metabolites ; 13(3)2023 Feb 21.
Article in English | MEDLINE | ID: mdl-36984753

ABSTRACT

Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics usually relies on mass spectrometry, a technology capable of detecting thousands of compounds in a biological sample. Metabolite annotation is executed using tandem mass spectrometry. Spectral library search is far from comprehensive, and numerous compounds remain unannotated. So-called in silico methods allow us to overcome the restrictions of spectral libraries, by searching in much larger molecular structure databases. Yet, after more than a decade of method development, in silico methods still do not reach the correct annotation rates that users would wish for. Here, we present a novel computational method called Mad Hatter for this task. Mad Hatter combines CSI:FingerID results with information from the searched structure database via a metascore. Compound information includes the melting point, and the number of words in the compound description starting with the letter 'u'. We then show that Mad Hatter reaches a stunning 97.6% correct annotations when searching PubChem, one of the largest and most comprehensive molecular structure databases. Unfortunately, Mad Hatter is not a real method. Rather, we developed Mad Hatter solely for the purpose of demonstrating common issues in computational method development and evaluation. We explain what evaluation glitches were necessary for Mad Hatter to reach this annotation level, what is wrong with similar metascores in general, and why metascores may screw up not only method evaluations but also the analysis of biological experiments. This paper may serve as an example of problems in the development and evaluation of machine learning models for metabolite annotation.

3.
Metabolomics ; 18(12): 97, 2022 11 27.
Article in English | MEDLINE | ID: mdl-36436113

ABSTRACT

INTRODUCTION: The structural identification of metabolites represents one of the current bottlenecks in non-targeted liquid chromatography-mass spectrometry (LC-MS) based metabolomics. The Metabolomics Standard Initiative has developed a multilevel system to report confidence in metabolite identification, which involves the use of MS, MS/MS and orthogonal data. Limitations due to similar or same fragmentation pattern (e.g. isomeric compounds) can be overcome by the additional orthogonal information of the retention time (RT), since it is a system property that is different for each chromatographic setup. OBJECTIVES: In contrast to MS data, sharing of RT data is not as widespread. The quality of data and its (re-)useability depend very much on the quality of the metadata. We aimed to evaluate the coverage and quality of this metadata from public metabolomics repositories. METHODS: We acquired an overview on the current reporting of chromatographic separation conditions. For this purpose, we defined the following information as important details that have to be provided: column name and dimension, flow rate, temperature, composition of eluents and gradient. RESULTS: We found that 70% of descriptions of the chromatographic setups are incomplete (according to our definition) and an additional 10% of the descriptions contained ambiguous and/or incorrect information. Accordingly, only about 20% of the descriptions allow further (re-)use of the data, e.g. for RT prediction. Therefore, we have started to develop a unified and standardized notation for chromatographic metadata with detailed and specific description of eluents, columns and gradients. CONCLUSION: Reporting of chromatographic metadata is currently not unified. Our recommended suggestions for metadata reporting will enable more standardization and automatization in future reporting.


Subject(s)
Metabolomics , Metadata , Tandem Mass Spectrometry , Chromatography, Liquid , Temperature
4.
Proc Natl Acad Sci U S A ; 119(35): e2122636119, 2022 08 30.
Article in English | MEDLINE | ID: mdl-36018838

ABSTRACT

Taxonomic classification, that is, the assignment to biological clades with shared ancestry, is a common task in genetics, mainly based on a genome similarity search of large genome databases. The classification quality depends heavily on the database, since representative relatives must be present. Many genomic sequences cannot be classified at all or only with a high misclassification rate. Here we present BERTax, a deep neural network program based on natural language processing to precisely classify the superkingdom and phylum of DNA sequences taxonomically without the need for a known representative relative from a database. We show BERTax to be at least on par with the state-of-the-art approaches when taxonomically similar species are part of the training data. For novel organisms, however, BERTax clearly outperforms any existing approach. Finally, we show that BERTax can also be combined with database approaches to further increase the prediction quality in almost all cases. Since BERTax is not based on similar entries in databases, it allows precise taxonomic classification of a broader range of genomic sequences, thus increasing the overall information gain.


Subject(s)
DNA Barcoding, Taxonomic , DNA , Deep Learning , Software , Algorithms , Base Sequence , DNA/classification , DNA/genetics , DNA Barcoding, Taxonomic/methods , Genome , Genomics
5.
Article in English | MEDLINE | ID: mdl-34879285

ABSTRACT

Metabolomics deals with the large-scale analysis of metabolites, belonging to numerous compound classes and showing an extremely high chemical diversity and complexity. Lipidomics, being a subcategory of metabolomics, analyzes the cellular lipid species. Both require state-of-the-art analytical methods capable of accessing the underlying chemical complexity. One of the major techniques used for the analysis of metabolites and lipids is Liquid Chromatography-Mass Spectrometry (LC-MS), offering both different selectivities in LC separation and high sensitivity in MS detection. Chromatography can be divided into different modes, based on the properties of the employed separation system. The most popular ones are Reversed-Phase (RP) separation for non- to mid-polar molecules and Hydrophilic Interaction Liquid Chromatography (HILIC) for polar molecules. So far, no single analysis method exists that can cover the entire range of metabolites or lipids, due to the huge chemical diversity. Consequently, different separation methods have been used for different applications and research questions. In this review, we explore the current use of LC-MS in metabolomics and lipidomics. As a proxy, we examined the use of chromatographic methods in the public repositories EBI MetaboLights and NIH Metabolomics Workbench. We extracted 1484 method descriptions, collected separation metadata and generated an overview on the current use of columns, eluents, etc. Based on this overview, we reviewed current practices and identified potential future trends as well as required improvements that may allow us to increase metabolite coverage, throughput or both simultaneously.


Subject(s)
Chromatography, Liquid , Mass Spectrometry , Metabolomics , Animals , Chromatography, Liquid/methods , Chromatography, Liquid/trends , Escherichia coli , Humans , Lipidomics/methods , Lipidomics/trends , Mass Spectrometry/methods , Mass Spectrometry/trends , Metabolomics/methods , Metabolomics/trends , Mice
SELECTION OF CITATIONS
SEARCH DETAIL