RESUMO
Open mass spectral libraries (OMSLs) are critical for metabolite annotation and machine learning, especially given the rising volume of untargeted metabolomic studies and the development of annotation pipelines. Despite their importance, the practical application of OMSLs is hampered by the lack of standardized file formats, metadata fields, and supporting ontology. Current libraries, often restricted to specific topics or matrices, such as natural products, lipids, or the human metabolome, may limit the discovery potential of untargeted studies. The goal of FragHub is to provide users with the capability to integrate various OMSLs into a single unified format, thereby enhancing the annotation accuracy and reliability. FragHub addresses these challenges by integrating multiple OMSLs into a single comprehensive database, supporting various data formats, and harmonizing metadata. It also proposes some generic filters for the mass spectrum using a graphical user interface. Additionally, a workflow to generate in-house libraries compatible with FragHub is proposed. FragHub dynamically segregates libraries based on ionization modes and chromatography techniques, thereby enhancing data utility in metabolomic research. The FragHub Python code is publicly available under a MIT license, at the following repository: https://github.com/eMetaboHUB/FragHub. Generated data can be accessed at 10.5281/zenodo.11057687.
RESUMO
INTRODUCTION: Lipids are key compounds in the study of metabolism and are increasingly studied in biology projects. It is a very broad family that encompasses many compounds, and the name of the same compound may vary depending on the community where they are studied. OBJECTIVES: In addition, their structures are varied and complex, which complicates their analysis. Indeed, the structural resolution does not always allow a complete level of annotation so the actual compound analysed will vary from study to study and should be clearly stated. For all these reasons the identification and naming of lipids is complicated and very variable from one study to another, it needs to be harmonized. METHODS & RESULTS: In this position paper we will present and discuss the different way to name lipids (with chemoinformatic and semantic identifiers) and their importance to share lipidomic results. CONCLUSION: Homogenising this identification and adopting the same rules is essential to be able to share data within the community and to map data on functional networks.
Assuntos
Lipidômica , Metabolômica , LipídeosRESUMO
In human health research, metabolic signatures extracted from metabolomics data have a strong added value for stratifying patients and identifying biomarkers. Nevertheless, one of the main challenges is to interpret and relate these lists of discriminant metabolites to pathological mechanisms. This task requires experts to combine their knowledge with information extracted from databases and the scientific literature. However, we show that most compounds (>99%) in the PubChem database lack annotated literature. This dearth of available information can have a direct impact on the interpretation of metabolic signatures, which is often restricted to a subset of significant metabolites. To suggest potential pathological phenotypes related to overlooked metabolites that lack annotated literature, we extend the "guilt-by-association" principle to literature information by using a Bayesian framework. The underlying assumption is that the literature associated with the metabolic neighbors of a compound can provide valuable insights, or an a priori, into its biomedical context. The metabolic neighborhood of a compound can be defined from a metabolic network and correspond to metabolites to which it is connected through biochemical reactions. With the proposed approach, we suggest more than 35,000 associations between 1,047 overlooked metabolites and 3,288 diseases (or disease families). All these newly inferred associations are freely available on the FORUM ftp server (see information at https://github.com/eMetaboHUB/Forum-LiteraturePropagation).
Assuntos
Conhecimento , Metabolômica , Humanos , Teorema de Bayes , Bases de Dados FactuaisRESUMO
MOTIVATION: Metabolomics studies aim at reporting a metabolic signature (list of metabolites) related to a particular experimental condition. These signatures are instrumental in the identification of biomarkers or classification of individuals, however their biological and physiological interpretation remains a challenge. To support this task, we introduce FORUM: a Knowledge Graph (KG) providing a semantic representation of relations between chemicals and biomedical concepts, built from a federation of life science databases and scientific literature repositories. RESULTS: The use of a Semantic Web framework on biological data allows us to apply ontological-based reasoning to infer new relations between entities. We show that these new relations provide different levels of abstraction and could open the path to new hypotheses. We estimate the statistical relevance of each extracted relation, explicit or inferred, using an enrichment analysis, and instantiate them as new knowledge in the KG to support results interpretation/further inquiries. AVAILABILITY AND IMPLEMENTATION: A web interface to browse and download the extracted relations, as well as a SPARQL endpoint to directly probe the whole FORUM KG, are available at https://forum-webapp.semantic-metabolomics.fr. The code needed to reproduce the triplestore is available at https://github.com/eMetaboHUB/Forum-DiseasesChem. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Reconhecimento Automatizado de Padrão , Publicações , Humanos , Bases de Dados FactuaisRESUMO
Lack of reliable peak detection impedes automated analysis of large-scale gas chromatography-mass spectrometry (GC-MS) metabolomics datasets. Performance and outcome of individual peak-picking algorithms can differ widely depending on both algorithmic approach and parameters, as well as data acquisition method. Therefore, comparing and contrasting between algorithms is difficult. Here we present a workflow for improved peak picking (WiPP), a parameter optimising, multi-algorithm peak detection for GC-MS metabolomics. WiPP evaluates the quality of detected peaks using a machine learning-based classification scheme based on seven peak classes. The quality information returned by the classifier for each individual peak is merged with results from different peak detection algorithms to create one final high-quality peak set for immediate down-stream analysis. Medium- and low-quality peaks are kept for further inspection. By applying WiPP to standard compound mixes and a complex biological dataset, we demonstrate that peak detection is improved through the novel way to assign peak quality, an automated parameter optimisation, and results in integration across different embedded peak picking algorithms. Furthermore, our approach can provide an impartial performance comparison of different peak picking algorithms. WiPP is freely available on GitHub (https://github.com/bihealth/WiPP) under MIT licence.
RESUMO
The life sciences are currently being transformed by an unprecedented wave of developments in molecular analysis, which include important advances in instrumental analysis as well as biocomputing. In light of the central role played by metabolism in nutrition, metabolomics is rapidly being established as a key analytical tool in human nutritional studies. Consequently, an increasing number of nutritionists integrate metabolomics into their study designs. Within this dynamic landscape, the potential of nutritional metabolomics (nutrimetabolomics) to be translated into a science, which can impact on health policies, still needs to be realized. A key element to reach this goal is the ability of the research community to join, to collectively make the best use of the potential offered by nutritional metabolomics. This article, therefore, provides a methodological description of nutritional metabolomics that reflects on the state-of-the-art techniques used in the laboratories of the Food Biomarker Alliance (funded by the European Joint Programming Initiative "A Healthy Diet for a Healthy Life" (JPI HDHL)) as well as points of reflections to harmonize this field. It is not intended to be exhaustive but rather to present a pragmatic guidance on metabolomic methodologies, providing readers with useful "tips and tricks" along the analytical workflow.
Assuntos
Biomarcadores/análise , Processamento Eletrônico de Dados/métodos , Metabolômica/métodos , Ciências da Nutrição/métodos , Cromatografia/métodos , Mineração de Dados , Ingestão de Alimentos , Prova Pericial , Análise de Alimentos , Humanos , Modelos Estatísticos , Análise Multivariada , Estado Nutricional , Reprodutibilidade dos TestesRESUMO
Metabolomics, the youngest of the major omics technologies, is supported by an active community of researchers and infrastructure developers across Europe. To coordinate and focus efforts around infrastructure building for metabolomics within Europe, a workshop on the "Future of metabolomics in ELIXIR" was organised at Frankfurt Airport in Germany. This one-day strategic workshop involved representatives of ELIXIR Nodes, members of the PhenoMeNal consortium developing an e-infrastructure that supports workflow-based metabolomics analysis pipelines, and experts from the international metabolomics community. The workshop established metabolite identification as the critical area, where a maximal impact of computational metabolomics and data management on other fields could be achieved. In particular, the existing four ELIXIR Use Cases, where the metabolomics community - both industry and academia - would benefit most, and which could be exhaustively mapped onto the current five ELIXIR Platforms were discussed. This opinion article is a call for support for a new ELIXIR metabolomics Use Case, which aligns with and complements the existing and planned ELIXIR Platforms and Use Cases.
RESUMO
Metabolomics is a key approach in modern functional genomics and systems biology. Due to the complexity of metabolomics data, the variety of experimental designs, and the multiplicity of bioinformatics tools, providing experimenters with a simple and efficient resource to conduct comprehensive and rigorous analysis of their data is of utmost importance. In 2014, we launched the Workflow4Metabolomics (W4M; http://workflow4metabolomics.org) online infrastructure for metabolomics built on the Galaxy environment, which offers user-friendly features to build and run data analysis workflows including preprocessing, statistical analysis, and annotation steps. Here we present the new W4M 3.0 release, which contains twice as many tools as the first version, and provides two features which are, to our knowledge, unique among online resources. First, data from the four major metabolomics technologies (i.e., LC-MS, FIA-MS, GC-MS, and NMR) can be analyzed on a single platform. By using three studies in human physiology, alga evolution, and animal toxicology, we demonstrate how the 40 available tools can be easily combined to address biological issues. Second, the full analysis (including the workflow, the parameter values, the input data and output results) can be referenced with a permanent digital object identifier (DOI). Publication of data analyses is of major importance for robust and reproducible science. Furthermore, the publicly shared workflows are of high-value for e-learning and training. The Workflow4Metabolomics 3.0 e-infrastructure thus not only offers a unique online environment for analysis of data from the main metabolomics technologies, but it is also the first reference repository for metabolomics workflows.
Assuntos
Processamento Eletrônico de Dados/métodos , Metabolômica/métodos , Software , Fluxo de Trabalho , Animais , Humanos , Espectroscopia de Ressonância Magnética/métodosRESUMO
This article describes a generic programmatic method for mapping chemical compound libraries on organism-specific metabolic networks from various databases (KEGG, BioCyc) and flat file formats (SBML and Matlab files). We show how this pipeline was successfully applied to decipher the coverage of chemical libraries set up by two metabolomics facilities MetaboHub (French National infrastructure for metabolomics and fluxomics) and Glasgow Polyomics (GP) on the metabolic networks available in the MetExplore web server. The present generic protocol is designed to formalize and reduce the volume of information transfer between the library and the network database. Matching of metabolites between libraries and metabolic networks is based on InChIs or InChIKeys and therefore requires that these identifiers are specified in both libraries and networks. In addition to providing covering statistics, this pipeline also allows the visualization of mapping results in the context of metabolic networks. In order to achieve this goal, we tackled issues on programmatic interaction between two servers, improvement of metabolite annotation in metabolic networks and automatic loading of a mapping in genome scale metabolic network analysis tool MetExplore. It is important to note that this mapping can also be performed on a single or a selection of organisms of interest and is thus not limited to large facilities.
RESUMO
The metabo-ring initiative brought together five nuclear magnetic resonance instruments (NMR) and 11 different mass spectrometers with the objective of assessing the reliability of untargeted metabolomics approaches in obtaining comparable metabolomics profiles. This was estimated by measuring the proportion of common spectral information extracted from the different LCMS and NMR platforms. Biological samples obtained from 2 different conditions were analysed by the partners using their own in-house protocols. Test #1 examined urine samples from adult volunteers either spiked or not spiked with 32 metabolite standards. Test #2 involved a low biological contrast situation comparing the plasma of rats fed a diet either supplemented or not with vitamin D. The spectral information from each instrument was assembled into separate statistical blocks. Correlations between blocks (e.g., instruments) were examined (RV coefficients) along with the structure of the common spectral information (common components and specific weights analysis). In addition, in Test #1, an outlier individual was blindly introduced, and its identification by the various platforms was evaluated. Despite large differences in the number of spectral features produced after post-processing and the heterogeneity of the analytical conditions and the data treatment, the spectral information both within (NMR and LCMS) and across methods (NMR vs. LCMS) was highly convergent (from 64 to 91 % on average). No effect of the LCMS instrumentation (TOF, QTOF, LTQ-Orbitrap) was noted. The outlier individual was best detected and characterised by LCMS instruments. In conclusion, untargeted metabolomics analyses report consistent information within and across instruments of various technologies, even without prior standardisation.
RESUMO
Liver protein can be altered under paracetamol (APAP) treatment. APAP-protein adducts and other protein modifications (oxidation/nitration, expression) play a role in hepatotoxicity induced by acute overdoses, but it is unknown whether liver protein modifications occur during long-term treatment with non-toxic doses of APAP. We quantified APAP-protein adducts and assessed other protein modifications in the liver from rats under chronic (17 days) treatment with two APAP doses (0.5% or 1% of APAP in the diet w/w). A targeted metabolomic method was validated and used to quantify APAP-protein adducts as APAP-cysteine adducts following proteolytic hydrolysis. The limit of detection was found to be 7ng APAP-cysteine/mL hydrolysate i.e. an APAP-Cys to tyrosine ratio of 0.016. Other protein modifications were assessed on the same protein hydrolysate by untargeted metabolomics including a new strategy to process the data and identify discriminant molecules. These two complementary mass spectrometry (MS)-based metabolic approaches enabled the assessment of a wide range of protein modifications induced by chronic treatment with APAP. BIOLOGICAL SIGNIFICANCE: APAP-protein adducts were detected even in the absence of glutathione depletion and hepatotoxicity, i.e. in the 0.5% APAP group, and increased by 218% in the 1% APAP group compared to the 0.5% APAP group. At the same time, the untargeted metabolomic method revealed a decrease in the binding of cysteine, cysteinyl-glycine and GSH to thiol groups of protein cysteine residues, an increase in the oxidation of tryptophan and proline residues and a modification in protein expression. This wide range of modifications in liver proteins occurred in rats under chronic treatment with APAP that did not induce hepatotoxicity.
Assuntos
Acetaminofen/administração & dosagem , Fígado/efeitos dos fármacos , Fígado/metabolismo , Espectrometria de Massas/métodos , Metaboloma/fisiologia , Proteoma/metabolismo , Analgésicos não Narcóticos/administração & dosagem , Animais , Relação Dose-Resposta a Droga , Perfilação da Expressão Gênica/métodos , Masculino , Metaboloma/efeitos dos fármacos , Ratos , Ratos Wistar , Reprodutibilidade dos Testes , Sensibilidade e EspecificidadeRESUMO
SUMMARY: The complex, rapidly evolving field of computational metabolomics calls for collaborative infrastructures where the large volume of new algorithms for data pre-processing, statistical analysis and annotation can be readily integrated whatever the language, evaluated on reference datasets and chained to build ad hoc workflows for users. We have developed Workflow4Metabolomics (W4M), the first fully open-source and collaborative online platform for computational metabolomics. W4M is a virtual research environment built upon the Galaxy web-based platform technology. It enables ergonomic integration, exchange and running of individual modules and workflows. Alternatively, the whole W4M framework and computational tools can be downloaded as a virtual machine for local installation. AVAILABILITY AND IMPLEMENTATION: http://workflow4metabolomics.org homepage enables users to open a private account and access the infrastructure. W4M is developed and maintained by the French Bioinformatics Institute (IFB) and the French Metabolomics and Fluxomics Infrastructure (MetaboHUB). CONTACT: contact@workflow4metabolomics.org.
Assuntos
Metabolômica/métodos , Software , Algoritmos , Biologia Computacional , Fluxo de TrabalhoRESUMO
In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architecture allows for the annotation and masking of transposable elements, the structural, and functional annotation of protein-coding genes with an evidence-based quality indexing, and the identification of conserved non-coding sequences and molecular markers. The TriAnnot pipeline is parallelized on a 712 CPU computing cluster that can run a 1-Gb sequence annotation in less than 5 days. It is accessible through a web interface for small scale analyses or through a server for large scale annotations. The performance of TriAnnot was evaluated in terms of sensitivity, specificity, and general fitness using curated reference sequence sets from rice and wheat. In less than 8 h, TriAnnot was able to predict more than 83% of the 3,748 CDS from rice chromosome 1 with a fitness of 67.4%. On a set of 12 reference Mb-sized contigs from wheat chromosome 3B, TriAnnot predicted and annotated 93.3% of the genes among which 54% were perfectly identified in accordance with the reference annotation. It also allowed the curation of 12 genes based on new biological evidences, increasing the percentage of perfect gene prediction to 63%. TriAnnot systematically showed a higher fitness than other annotation pipelines that are not improved for wheat. As it is easily adaptable to the annotation of other plant genomes, TriAnnot should become a useful resource for the annotation of large and complex genomes in the future.