ABSTRACT
OBJECTIVE: Colorectal cancer (CRC) is conventionally classified as right sided, left sided, and rectal cancer. Clinicopathological, molecular features and risk factors do not change abruptly along the colorectum, and variations exist even within the refined subsites, which may contribute to inconsistencies in the identification of clinically relevant CRC biomarkers. We generated a CRC metabolome map to describe the association between metabolites, diagnostic and survival heterogeneity in cancers of different subsites of the colorectum. DESIGN: Utilizing 372 patient-matched tumor and normal mucosa tissues, liquid chromatography-mass spectrometry was applied to examine metabolomic profiles along seven subsites of the colorectum: cecum (n = 63), ascending colon (n = 44), transverse colon (n = 32), descending colon (n = 28), sigmoid colon (n = 75), rectosigmoid colon (n = 38), and rectum (n = 92). RESULTS: 39 and 70 significantly altered metabolites (including bile acids, lysophosphatidylcholines and lysophosphatidylethanolamines) among tumors and normal mucosa, respectively, showed inter-subsite metabolic heterogeneity between CRC subsites. Gradual changes in metabolite abundances with significantly linear trends from cecum to rectum were observed: 23 tumor-specific metabolites, 30 normal mucosa-specific metabolites, and 15 metabolites in both tumor and normal mucosa, had concentration gradients across the colorectum, and is disease status dependent. The metabolites that showed a linear trend included bile acids, amino acids, lysophosphatidylcholines, and lysophosphatidylethanolamines. Comparison of tumors to patient-matched normal mucosa revealed metabolite changes exclusive to each subsite, thereby further highlighting differences in cancer metabolism across the 7 subsites of the colorectum. Furthermore, metabolites associated with survival were different and unique to each subsite. Finally, an interactive and publicly accessible CRC metabolome database was designed to enable access and utilization of this rich data resource ( https://colorectal-cancer-metabolome.com/yale-university ). CONCLUSIONS: Gradual changes exist in metabolite abundances from the cecum to the rectum. The association between patient survival and distinct metabolites with anatomic subsite of the colorectum, reveals differences between cancers across the colorectum. These inter-subsite metabolic heterogeneities enrich the current understanding and substantiate previous studies that have challenged the conventional classification of right-sided, left-sided, and rectal cancers, by identifying specific metabolites that offer new biological insights into CRC subsite heterogeneity. The database designed in this study will enable researchers to delve into granular information on the CRC metabolome, which until now has not been available.
Subject(s)
Colorectal Neoplasms , Metabolome , Humans , Colorectal Neoplasms/metabolism , Colorectal Neoplasms/pathology , Colorectal Neoplasms/genetics , Male , Female , Middle Aged , Aged , Metabolomics/methods , Biomarkers, Tumor/metabolism , Rectum/pathology , Rectum/metabolismABSTRACT
The application of machine learning (ML) to -omics research is growing at an exponential rate owing to the increasing availability of large amounts of data for model training. Specifically, in metabolomics, ML has enabled the prediction of tandem mass spectrometry and retention time data. More recently, due to the advent of ion mobility, new ML models have been introduced for collision cross-section (CCS) prediction, but those have been trained with different and relatively small data sets covering a few thousands of small molecules, which hampers their systematic comparison. Here, we compared four existing ML-based CCS prediction models and their capacity to predict CCS values using the recently introduced METLIN-CCS data set. We also compared them with simple linear models and with ML models that used fingerprints as regressors. We analyzed the role of structural diversity of the data on which the ML models are trained with and explored the practical application of these models for metabolite annotation using CCS values. Results showed a limited capability of the existing models to achieve the necessary accuracy to be adopted for routine metabolomics analysis. We showed that for a particular molecule, this accuracy could only be improved when models were trained with a large number of structurally similar counterparts. Therefore, we suggest that current annotation capabilities will only be significantly altered with models trained with heterogeneous data sets composed of large homogeneous hubs of structurally similar molecules to those being predicted.
Subject(s)
Machine Learning , Metabolomics , Metabolomics/methods , Tandem Mass Spectrometry/methodsABSTRACT
Commonly, in MS-based untargeted metabolomics, some metabolites cannot be confidently identified due to ambiguities in resolving isobars and structurally similar species. To address this, analytical techniques beyond traditional MS2 analysis, such as MSn fragmentation, can be applied to probe metabolites for additional structural information. In MSn fragmentation, recursive cycles of activation are applied to fragment ions originating from the same precursor ion detected on an MS1 spectrum. This resonant-type collision-activated dissociation (CAD) can yield information that cannot be ascertained from MS2 spectra alone, which helps improve the performance of metabolite identification workflows. However, most approaches for metabolite identification require mass-to-charge (m/z) values measured with high resolution, as this enables the determination of accurate mass values. Unfortunately, high-resolution-MSn spectra are relatively rare in spectral libraries. Here, we describe a computational approach to generate a database of high-resolution-MSn spectra by converting existing low-resolution-MSn spectra using complementary high-resolution-MS2 spectra generated by beam-type CAD. Using this method, we have generated a database, derived from the NIST20 MS/MS database, of MSn spectral trees representing 9637 compounds and 19386 precursor ions where at least 90% of signal intensity was converted from low-to-high resolution.
Subject(s)
Metabolomics , Tandem Mass Spectrometry , Tandem Mass Spectrometry/methods , Metabolomics/methods , Databases, Factual , Ions/chemistry , WorkflowABSTRACT
We report XCMS-MRM and METLIN-MRM ( http://xcmsonline-mrm.scripps.edu/ and http://metlin.scripps.edu/ ), a cloud-based data-analysis platform and a public multiple-reaction monitoring (MRM) transition repository for small-molecule quantitative tandem mass spectrometry. This platform provides MRM transitions for more than 15,500 molecules and facilitates data sharing across different instruments and laboratories.
Subject(s)
Cloud Computing , Small Molecule Libraries/chemistry , Chromatography, Liquid/methods , Computational Biology , Metabolomics , Tandem Mass SpectrometryABSTRACT
Electrospray ionization (ESI) in-source fragmentation (ISF) has traditionally been minimized to promote precursor molecular ion formation, and therefore its value in molecular identification is underappreciated. In-source annotation algorithms have been shown to increase confidence in putative identifications by using ubiquitous in-source fragments. However, these in-source annotation algorithms are limited by ESI sources that are generally designed to minimize ISF. In this study, enhanced in-source fragmentation annotation (eISA) was created by tuning the ISF conditions to generate in-source fragmentation patterns comparable with higher energy fragments generated at higher collision energies as deposited in the METLIN MS/MS library, without compromising the intensity of precursor ions (median loss ≤10% in both positive and negative ionization modes). The analysis of 50 molecules was used to validate the approach in comparison to MS/MS spectra produced via data dependent acquisition (DDA) and data independent acquisition (DIA) mode with quadrupole time-of-flight mass spectrometry (QTOF-MS). Enhanced ISF as compared to QTOF DDA enabled higher peak intensities for the precursor ions (median: 18 times in negative mode and 210 times in positive mode), with the eISA fragmentation patterns consistent with METLIN for over 90% of the molecules with respect to fragment relative intensity and m/z. eISA also provides higher peak intensity as opposed to QTOF DIA for over 60% of the precursor ions in negative mode (median increase: 20%) and for 88% of the precursor ions in positive mode (median increase: 80%). Molecular identification with eISA was also successfully validated from the analysis of a metabolic extract from macrophages. An interesting side benefit of enhanced ISF is that it significantly improved molecular identification confidence with low resolution single quadrupole mass-spectrometry-based untargeted LC/MS experiments. Overall, enhanced ISF allowed for eISA to be used as a more sensitive alternative to other QTOF DIA and DDA approaches, and further, it enabled the acquisition of ESI TOF and ESI single quadrupole mass spectrometry instrumentation spectra with improved molecular identification confidence.
Subject(s)
Organic Chemicals/analysis , Spectrometry, Mass, Electrospray Ionization , Tandem Mass SpectrometryABSTRACT
Computational metabolite annotation in untargeted profiling aims at uncovering neutral molecular masses of underlying metabolites and assign those with putative identities. Existing annotation strategies rely on the observation and annotation of adducts to determine metabolite neutral masses. However, a significant fraction of features usually detected in untargeted experiments remains unannotated, which limits our ability to determine neutral molecular masses. Despite the availability of tools to annotate, relatively few of them benefit from the inherent presence of in-source fragments in liquid chromatography-electrospray ionization-mass spectrometry. In this study, we introduce a strategy to annotate in-source fragments in untargeted data using low-energy tandem mass spectrometry (MS) spectra from the METLIN library. Our algorithm, MISA (METLIN-guided in-source annotation), compares detected features against low-energy fragments from MS/MS spectra, enabling robust annotation and putative identification of metabolic features based on low-energy spectral matching. The algorithm was evaluated through an annotation analysis of a total of 140 metabolites across three different sets of biological samples analyzed with liquid chromatography-mass spectrometry. Results showed that, in cases where adducts were not formed or detected, MISA was able to uncover neutral molecular masses by in-source fragment matching. MISA was also able to provide putative metabolite identities via two annotation scores. These scores take into account the number of in-source fragments matched and the relative intensity similarity between the experimental data and the reference low-energy MS/MS spectra. Overall, results showed that in-source fragmentation is a highly frequent phenomena that should be considered for comprehensive feature annotation. Thus, combined with adduct annotation, this strategy adds a complementary annotation layer, enabling in-source fragments to be annotated and increasing putative identification confidence. The algorithm is integrated into the XCMS Online platform and is freely available at http://xcmsonline.scripps.edu .
Subject(s)
Metabolome , Metabolomics/methods , Algorithms , Amino Acids/chemistry , Amino Acids/metabolism , Animals , Brain/metabolism , Chromatography, High Pressure Liquid , Creatine/analysis , Creatine/metabolism , Databases, Factual , Mice , Tandem Mass SpectrometryABSTRACT
Comprehensive metabolomic data can be achieved using multiple orthogonal separation and mass spectrometry (MS) analytical techniques. However, drawing biologically relevant conclusions from this data and combining it with additional layers of information collected by other omic technologies present a significant bioinformatic challenge. To address this, a data processing approach was designed to automate the comprehensive prediction of dysregulated metabolic pathways/networks from multiple data sources. The platform autonomously integrates multiple MS-based metabolomics data types without constraints due to different sample preparation/extraction, chromatographic separation, or MS detection method. This multimodal analysis streamlines the extraction of biological information from the metabolomics data as well as the contextualization within proteomics and transcriptomics data sets. As a proof of concept, this multimodal analysis approach was applied to a colorectal cancer (CRC) study, in which complementary liquid chromatography-mass spectrometry (LC-MS) data were combined with proteomic and transcriptomic data. Our approach provided a highly resolved overview of colon cancer metabolic dysregulation, with an average 17% increase of detected dysregulated metabolites per pathway and an increase in metabolic pathway prediction confidence. Moreover, 95% of the altered metabolic pathways matched with the dysregulated genes and proteins, providing additional validation at a systems level. The analysis platform is currently available via the XCMS Online ( XCMSOnline.scripps.edu ).
Subject(s)
Colorectal Neoplasms/metabolism , Metabolic Networks and Pathways , Metabolomics/methods , Systems Biology/methods , Chromatography, Liquid/methods , Colorectal Neoplasms/genetics , Computational Biology/methods , Genomics/methods , Humans , Tandem Mass Spectrometry/methods , TranscriptomeABSTRACT
METLIN originated as a database to characterize known metabolites and has since expanded into a technology platform for the identification of known and unknown metabolites and other chemical entities. Through this effort it has become a comprehensive resource containing over 1 million molecules including lipids, amino acids, carbohydrates, toxins, small peptides, and natural products, among other classes. METLIN's high-resolution tandem mass spectrometry (MS/MS) database, which plays a key role in the identification process, has data generated from both reference standards and their labeled stable isotope analogues, facilitated by METLIN-guided analysis of isotope-labeled microorganisms. The MS/MS data, coupled with the fragment similarity search function, expand the tool's capabilities into the identification of unknowns. Fragment similarity search is performed independent of the precursor mass, relying solely on the fragment ions to identify similar structures within the database. Stable isotope data also facilitate characterization by coupling the similarity search output with the isotopic m/ z shifts. Examples of both are demonstrated here with the characterization of four previously unknown metabolites. METLIN also now features in silico MS/MS data, which has been made possible through the creation of algorithms trained on METLIN's MS/MS data from both standards and their isotope analogues. With these informatic and experimental data features, METLIN is being designed to address the characterization of known and unknown molecules.
Subject(s)
Cell Extracts/analysis , Databases, Chemical/statistics & numerical data , Datasets as Topic/statistics & numerical data , Metabolomics/methods , Metabolomics/statistics & numerical data , Pichia/chemistry , Pichia/metabolism , Tandem Mass Spectrometry/statistics & numerical dataABSTRACT
Concurrent exposure to a wide variety of xenobiotics and their combined toxic effects can play a pivotal role in health and disease, yet are largely unexplored. Investigating the totality of these exposures, i.e., the "exposome", and their specific biological effects constitutes a new paradigm for environmental health but still lacks high-throughput, user-friendly technology. We demonstrate the utility of mass spectrometry-based global exposure metabolomics combined with tailored database queries and cognitive computing for comprehensive exposure assessment and the straightforward elucidation of biological effects. The METLIN Exposome database has been redesigned to help identify environmental toxicants, food contaminants and supplements, drugs, and antibiotics as well as their biotransformation products, through its expansion with over 700 000 chemical structures to now include more than 950 000 unique small molecules. More importantly, we demonstrate how the XCMS/METLIN platform now allows for the readout of the biological effect of a toxicant through metabolomic-derived pathway analysis, and further, artificial intelligence provides a means of assessing the role of a potential toxicant. The presented workflow addresses many of the methodological challenges current exposomics research is facing and will serve to gain a deeper understanding of the impact of environmental exposures and combinatory toxic effects on human health.
Subject(s)
Artificial Intelligence , Metabolomics/methods , Databases, Genetic , Genomics , Humans , MaleABSTRACT
Gas chromatography coupled to mass spectrometry (GC/MS) has been a long-standing approach used for identifying small molecules due to the highly reproducible ionization process of electron impact ionization (EI). However, the use of GC-EI MS in untargeted metabolomics produces large and complex data sets characterized by coeluting compounds and extensive fragmentation of molecular ions caused by the hard electron ionization. In order to identify and extract quantitative information on metabolites across multiple biological samples, integrated computational workflows for data processing are needed. Here we introduce eRah, a free computational tool written in the open language R composed of five core functions: (i) noise filtering and baseline removal of GC/MS chromatograms, (ii) an innovative compound deconvolution process using multivariate analysis techniques based on compound match by local covariance (CMLC) and orthogonal signal deconvolution (OSD), (iii) alignment of mass spectra across samples, (iv) missing compound recovery, and (v) identification of metabolites by spectral library matching using publicly available mass spectra. eRah outputs a table with compound names, matching scores and the integrated area of compounds for each sample. The automated capabilities of eRah are demonstrated by the analysis of GC-time-of-flight (TOF) MS data from plasma samples of adolescents with hyperinsulinaemic androgen excess and healthy controls. The quantitative results of eRah are compared to centWave, the peak-picking algorithm implemented in the widely used XCMS package, MetAlign, and ChromaTOF software. Significantly dysregulated metabolites are further validated using pure standards and targeted analysis by GC-triple quadrupole (QqQ) MS, LC-QqQ, and NMR. eRah is freely available at http://CRAN.R-project.org/package=erah .
Subject(s)
Androgens/blood , Hyperinsulinism/blood , Metabolomics , Software , Adolescent , Algorithms , Gas Chromatography-Mass Spectrometry , Humans , Multivariate AnalysisABSTRACT
Enterohemorrhagic Escherichia coli (EHEC) is a major food-borne pathogen that causes human disease ranging from diarrhea to life-threatening complications. Accumulating evidence demonstrates that the Western diet enhances the susceptibility to enteric infection in mice, but the effect of diet on EHEC colonization and the role of human gut microbiota remains unknown. Our research aimed to investigate the effects of a Standard versus a Western diet on EHEC colonization in the human in vitro Mucosal ARtificial COLon (M-ARCOL) and the associated changes in the gut microbiota composition and activities. After donor selection using simplified fecal batch experiments, two M-ARCOL bioreactors were inoculated with a human fecal sample (n = 4) and were run in parallel, one receiving a Standard diet, the other a Western diet and infected with EHEC O157:H7 strain EDL933. EHEC colonization was dependent on the donor and diet in the luminal samples, but was maintained in the mucosal compartment without elimination, suggesting a favorable niche for the pathogen, and may act as a reservoir. The Western diet also impacted the bacterial short-chain fatty acid and bile acid profiles, with a possible link between high butyrate concentrations and prolonged EHEC colonization. The work demonstrates the application of a complex in vitro model to provide insights into diet, microbiota, and pathogen interactions in the human gut.
Subject(s)
Colon , Diet, Western , Enterohemorrhagic Escherichia coli , Feces , Gastrointestinal Microbiome , Humans , Gastrointestinal Microbiome/physiology , Diet, Western/adverse effects , Colon/microbiology , Feces/microbiology , Escherichia coli Infections/microbiology , Intestinal Mucosa/microbiology , Intestinal Mucosa/metabolism , Fatty Acids, Volatile/metabolism , Bile Acids and Salts/metabolism , Escherichia coli O157ABSTRACT
In gas chromatography-mass spectrometry-based untargeted metabolomics, metabolites are identified by comparing mass spectra and chromatographic retention time with reference databases or standard materials. In that sense, machine learning has been used to predict the retention time of metabolites lacking reference data. However, the retention time prediction of trimethylsilyl derivatives of metabolites, typically analyzed in untargeted metabolomics using gas chromatography, has been poorly explored. Here, we provide a rationalized framework for machine learning-based retention time prediction of trimethylsilyl derivatives of metabolites in gas chromatography. We compared different machine learning paradigms, in addition to exploring the influence of the computational molecular structure representation to train the prediction models: fingerprint class and fingerprint calculation software. Our study challenged predicted retention time when using chemical ionization and electron impact ionization sources in simulated and real cases, demonstrating a good correct identity ranking capability by machine learning, despite observing a limited false identity filtering power in cases where a spectrum or a monoisotopic mass match to multiple candidates. Specifically, machine learning prediction yielded median absolute and relative retention index (relative retention time) errors of 37.1 retention index units and 2%, respectively. In addition, fingerprint class and fingerprint calculation software, as well as the molecular structural similarity between the training and test or real case sets, showed to be critical modulators of the prediction performance. Finally, we leveraged the structural similarity between the training and test or real case set to determine the probability that the prediction error is below a specific threshold. Overall, our study demonstrates that predicted retention time can provide insights into the true structure of unknown metabolites by ranking from the most to the least plausible molecular identity, and sets the guidelines to assess the confidence in metabolite identification using predicted retention time data.
ABSTRACT
The microbial-derived metabolite, 3-indolepropionic acid (3-IPA), has been intensely studied since its origins were discovered in 2009; however, 3-IPA's role in immunosuppression has had limited attention. Untargeted metabolomic analyses of T-cell exhaustion and immunosuppression, represented by dysfunctional under-responsive CD8+ T cells, reveal a potential role of 3-IPA in these responses. T-cell exhaustion was examined via infection of two genetically related mouse strains, DBA/1J and DBA/2J, with lymphocytic choriomeningitis virus (LCMV) Clone 13 (Cl13). The different mouse strains produced disparate outcomes driven by their T-cell responses. Infected DBA/2J presented with exhausted T cells and persistent infection, and DBA/1J mice died one week after infection from cytotoxic T lymphocytes (CTLs)-mediated pulmonary failure. Metabolomics revealed over 70 metabolites were altered between the DBA/1J and DBA/2J models over the course of the infection, most of them in mice with a fatal outcome. Cognitive-driven prioritization combined with statistical significance and fold change were used to prioritize the metabolites. 3-IPA, a tryptophan-derived metabolite, was identified as a high-priority candidate for testing. To test its activity 3-IPA was added to the drinking water of the mouse models during LCMV Cl13 infection, with the results showing that 3-IPA allowed the mice to survive longer. This negative immune-modulation effect might be of interest for the modulation of CTL responses in events such as autoimmune diseases, type I diabetes or even COVID-19. Moreover, 3-IPA's bacterial origin raises the possibility of targeting the microbiome to enhance CTL responses in diseases such as cancer and chronic infection.
ABSTRACT
Worldwide, obesity rates have doubled since the 1980s and in the USA alone, almost 40% of adults are obese, which is closely associated with a myriad of metabolic diseases such as type 2 diabetes and arteriosclerosis. Obesity is derived from an imbalance between energy intake and consumption, therefore balancing energy homeostasis is an attractive target for metabolic diseases. One therapeutic approach consists of increasing the number of brown-like adipocytes in the white adipose tissue (WAT). Whereas WAT stores excess energy, brown adipose tissue (BAT) can dissipate this energy overload in the form of heat, increasing energy expenditure and thus inhibiting metabolic diseases. To facilitate BAT production a high-throughput screening approach was developed on previously known drugs using human Simpson-Golabi-Behmel Syndrome (SGBS) preadipocytes. The screening allowed us to discover that zafirlukast, an FDA-approved small molecule drug commonly used to treat asthma, was able to differentiate adipocyte precursors and white-biased adipocytes into functional brown adipocytes. However, zafirlukast is toxic to human cells at higher dosages. Drug-Initiated Activity Metabolomics (DIAM) was used to investigate zafirlukast as a BAT inducer, and the endogenous metabolite myristoylglycine was then discovered to mimic the browning properties of zafirlukast without impacting cell viability. Myristoylglycine was found to be bio-synthesized upon zafirlukast treatment and was unique in inducing brown adipocyte differentiation, raising the possibility of using endogenous metabolites and bypassing the exogenous drugs to potentially alleviate disease, in this case, obesity and other related metabolic diseases.
ABSTRACT
Hypertension and kidney disease have been repeatedly associated with genomic variants and alterations of lysine metabolism. Here, we combined stable isotope labeling with untargeted metabolomics to investigate lysine's metabolic fate in vivo. Dietary 13C6 labeled lysine was tracked to lysine metabolites across various organs. Globally, lysine reacts rapidly with molecules of the central carbon metabolism, but incorporates slowly into proteins and acylcarnitines. Lysine metabolism is accelerated in a rat model of hypertension and kidney damage, chiefly through N-alpha-mediated degradation. Lysine administration diminished development of hypertension and kidney injury. Protective mechanisms include diuresis, further acceleration of lysine conjugate formation, and inhibition of tubular albumin uptake. Lysine also conjugates with malonyl-CoA to form a novel metabolite Nε-malonyl-lysine to deplete malonyl-CoA from fatty acid synthesis. Through conjugate formation and excretion as fructoselysine, saccharopine, and Nε-acetyllysine, lysine lead to depletion of central carbon metabolites from the organism and kidney. Consistently, lysine administration to patients at risk for hypertension and kidney disease inhibited tubular albumin uptake, increased lysine conjugate formation, and reduced tricarboxylic acid (TCA) cycle metabolites, compared to kidney-healthy volunteers. In conclusion, lysine isotope tracing mapped an accelerated metabolism in hypertension, and lysine administration could protect kidneys in hypertensive kidney disease.
Subject(s)
Hypertension , Kidney , Lysine , Albumins/metabolism , Animals , Carbon/metabolism , Disease Models, Animal , Hypertension/metabolism , Kidney/metabolism , Lysine/metabolism , Malonyl Coenzyme A/metabolism , RatsABSTRACT
Untargeted metabolomics of disease-associated intestinal microbiota can detect quantitative changes in metabolite profiles and complement other methodologies to reveal the full effect of intestinal dysbiosis. Here, we used the T cell transfer mouse model of colitis to identify small-molecule metabolites with altered abundance due to intestinal inflammation. We applied untargeted metabolomics to detect metabolite signatures in cecal, colonic, and fecal samples from healthy and colitic mice and to uncover differences that would aid in the identification of colitis-associated metabolic processes. We provided an unbiased spatial survey of the GI tract for small molecules, and we identified the likely source of metabolites and biotransformations. Several prioritized metabolites that we detected as being altered in colitis were evaluated for their ability to induce inflammatory signaling in cultured macrophages, such as NF-κB signaling and the expression of cytokines and chemokines upon LPS stimulation. Multiple previously uncharacterized anti-inflammatory and inflammation-augmenting metabolites were thus identified, with phytosphingosine showing the most effective anti-inflammatory activity in vitro. We further demonstrated that oral administration of phytosphingosine decreased inflammation in a mouse model of colitis induced by the compound TNBS. The collection of distinct metabolites we identified and characterized, many of which have not been previously associated with colitis, may offer new biological insight into IBD-associated inflammation and disease pathogenesis.
Subject(s)
Colitis , T-Lymphocytes , Anti-Inflammatory Agents , Humans , MetabolomicsABSTRACT
Cognitive computing is revolutionizing the way big data are processed and integrated, with artificial intelligence (AI) natural language processing (NLP) platforms helping researchers to efficiently search and digest the vast scientific literature. Most available platforms have been developed for biomedical researchers, but new NLP tools are emerging for biologists in other fields and an important example is metabolomics. NLP provides literature-based contextualization of metabolic features that decreases the time and expert-level subject knowledge required during the prioritization, identification and interpretation steps in the metabolomics data analysis pipeline. Here, we describe and demonstrate four workflows that combine metabolomics data with NLP-based literature searches of scientific databases to aid in the analysis of metabolomics data and their biological interpretation. The four procedures can be used in isolation or consecutively, depending on the research questions. The first, used for initial metabolite annotation and prioritization, creates a list of metabolites that would be interesting for follow-up. The second workflow finds literature evidence of the activity of metabolites and metabolic pathways in governing the biological condition on a systems biology level. The third is used to identify candidate biomarkers, and the fourth looks for metabolic conditions or drug-repurposing targets that the two diseases have in common. The protocol can take 1-4 h or more to complete, depending on the processing time of the various software used.
Subject(s)
Metabolomics/methods , Natural Language Processing , Systems Biology/methods , Animals , Artificial Intelligence , Big Data , Data Analysis , Databases, Factual , Humans , Mass Spectrometry , Metabolic Networks and Pathways , Software , WorkflowABSTRACT
XCMS is one of the most used software for liquid chromatography-mass spectrometry (LC-MS) data processing and it exists both as an R package and as a cloud-based platform known as XCMS Online. In this chapter, we first overview the nature of LC-MS data to contextualize the need for data processing software. Next, we describe the algorithms used by XCMS and the role that the different user-defined parameters play in the data processing. Finally, we describe the extended capabilities of XCMS Online.
Subject(s)
Data Interpretation, Statistical , Metabolomics , Software , Algorithms , Chromatography, Liquid , Computational Biology/methods , Mass Spectrometry , Metabolomics/methods , Online Systems , User-Computer Interface , WorkflowABSTRACT
Calorie restriction (CR) enhances health span (the length of time that an organism remains healthy) and increases longevity across species. In mice, these beneficial effects are partly mediated by the lowering of core body temperature that occurs during CR. Conversely, the favorable effects of CR on health span are mitigated by elevating ambient temperature to thermoneutrality (30°C), a condition in which hypothermia is blunted. In this study, we compared the global metabolic response to CR of mice housed at 22°C (the standard housing temperature) or at 30°C and found that thermoneutrality reverted 39 and 78% of total systemic or hypothalamic metabolic variations caused by CR, respectively. Systemic changes included pathways that control fuel use and energy expenditure during CR. Cognitive computing-assisted analysis of these metabolomics results helped to prioritize potential active metabolites that modulated the hypothermic response to CR. Last, we demonstrated with pharmacological approaches that nitric oxide (NO) produced through the citrulline-NO pathway promotes CR-triggered hypothermia and that leucine enkephalin directly controls core body temperature when exogenously injected into the hypothalamus. Because thermoneutrality counteracts CR-enhanced health span, the multiple metabolites and pathways altered by thermoneutrality may represent targets for mimicking CR-associated effects.