Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Cell ; 187(7): 1801-1818.e20, 2024 Mar 28.
Artículo en Inglés | MEDLINE | ID: mdl-38471500

RESUMEN

The repertoire of modifications to bile acids and related steroidal lipids by host and microbial metabolism remains incompletely characterized. To address this knowledge gap, we created a reusable resource of tandem mass spectrometry (MS/MS) spectra by filtering 1.2 billion publicly available MS/MS spectra for bile-acid-selective ion patterns. Thousands of modifications are distributed throughout animal and human bodies as well as microbial cultures. We employed this MS/MS library to identify polyamine bile amidates, prevalent in carnivores. They are present in humans, and their levels alter with a diet change from a Mediterranean to a typical American diet. This work highlights the existence of many more bile acid modifications than previously recognized and the value of leveraging public large-scale untargeted metabolomics data to discover metabolites. The availability of a modification-centric bile acid MS/MS library will inform future studies investigating bile acid roles in health and disease.


Asunto(s)
Ácidos y Sales Biliares , Microbioma Gastrointestinal , Metabolómica , Espectrometría de Masas en Tándem , Animales , Humanos , Ácidos y Sales Biliares/química , Metabolómica/métodos , Poliaminas , Espectrometría de Masas en Tándem/métodos , Bases de Datos de Compuestos Químicos
2.
Nat Methods ; 20(6): 881-890, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-37055660

RESUMEN

A substantial fraction of metabolic features remains undetermined in mass spectrometry (MS)-based metabolomics, and molecular formula annotation is the starting point for unraveling their chemical identities. Here we present bottom-up tandem MS (MS/MS) interrogation, a method for de novo formula annotation. Our approach prioritizes MS/MS-explainable formula candidates, implements machine-learned ranking and offers false discovery rate estimation. Compared with the mathematically exhaustive formula enumeration, our approach shrinks the formula candidate space by 42.8% on average. Method benchmarking on annotation accuracy was systematically carried out on reference MS/MS libraries and real metabolomics datasets. Applied on 155,321 recurrent unidentified spectra, our approach confidently annotated >5,000 novel molecular formulae absent from chemical databases. Beyond the level of individual metabolic features, we combined bottom-up MS/MS interrogation with global optimization to refine formula annotations while revealing peak interrelationships. This approach allowed the systematic annotation of 37 fatty acid amide molecules in human fecal data. All bioinformatics pipelines are available in a standalone software, BUDDY ( https://github.com/HuanLab/BUDDY ).


Asunto(s)
Programas Informáticos , Espectrometría de Masas en Tándem , Humanos , Espectrometría de Masas en Tándem/métodos , Metabolómica/métodos , Biología Computacional , Bases de Datos de Compuestos Químicos
3.
Anal Chem ; 96(6): 2590-2598, 2024 Feb 13.
Artículo en Inglés | MEDLINE | ID: mdl-38294426

RESUMEN

High-resolution mass spectrometry (HRMS) is a prominent analytical tool that characterizes chlorinated disinfection byproducts (Cl-DBPs) in an unbiased manner. Due to the diversity of chemicals, complex background signals, and the inherent analytical fluctuations of HRMS, conventional isotopic pattern (37Cl/35Cl), mass defect, and direct molecular formula (MF) prediction are insufficient for accurate recognition of the diverse Cl-DBPs in real environmental samples. This work proposes a novel strategy to recognize Cl-containing chemicals based on machine learning. Our hierarchical machine learning framework has two random forest-based models: the first layer is a binary classifier to recognize Cl-containing chemicals, and the second layer is a multiclass classifier to annotate the number of Cl present. This model was trained using ∼1.4 million distinctive MFs from PubChem. Evaluated on over 14,000 unique MFs from NIST20, this machine learning model achieved 93.3% accuracy in recognizing Cl-containing MFs (Cl-MFs) and 92.9% accuracy in annotating the number of Cl for Cl-MFs. Furthermore, the trained model was integrated into ChloroDBPFinder, a standalone R package for the streamlined processing of LC-HRMS data and annotating both known and unknown Cl-containing compounds. Tested on existing Cl-DBP data sets related to aspartame chlorination in tap water, our ChloroDBPFinder efficiently extracted 159 Cl-containing DBP features and tentatively annotated the structures of 10 Cl-DBPs via molecular networking. In another application of a chlorinated humic substance, ChloroDBPFinder extracted 79 high-quality Cl-DBPs and tentatively annotated six compounds. In summary, our proposed machine learning strategy and the developed ChloroDBPFinder provide an advanced solution to identifying Cl-containing compounds in nontargeted analysis of water samples. It is freely available on GitHub (https://github.com/HuanLab/ChloroDBPFinder).

4.
Anal Chem ; 95(35): 13018-13028, 2023 09 05.
Artículo en Inglés | MEDLINE | ID: mdl-37603462

RESUMEN

The purity of tandem mass spectrometry (MS/MS) is essential to MS/MS-based metabolite annotation and unknown exploration. This work presents a de novo approach to cleaning chimeric MS/MS spectra generated in liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based metabolomics. The assumption is that true fragments and their precursors are well correlated across the samples in a study, while false or contamination fragments are rather independent. Using data simulation, this work starts with an investigation of the negative effects of chimeric MS/MS spectra on spectral similarity analysis and molecular networking. Next, the characteristics of true and false fragments in chimeric MS/MS spectra were investigated using MS/MS of chemical standards. We recognized three fragment peak attributes indicative of whether a peak is a false fragment, including (1) intensity ratio fluctuation, (2) appearance rate, and (3) relative intensity. Using these attributes, we tested three machine learning models and identified XGBoost as the best model to achieve an area under the precision-recall curve of 0.98 for a clear separation between true and false fragments. Based on the trained model, we constructed an automated bioinformatic platform, DNMS2Purifier (short for de novo MS2Purifier), for metabolic features from metabolomics studies. DNMS2Purifier recognizes and processes chimeric MS/MS spectra without additional sample analysis or library confirmation. DNMS2Purifer was evaluated on a metabolomics data set generated with different MS/MS precursor isolation windows. It successfully captured the increase in the number of false fragments from the increased isolation window. DNMS2Purifier was also compared to MS2Purifier, an existing MS/MS spectral cleaning tool based on the addition of data-independent acquisition (DIA) analysis. Results indicated that DNMS2Purifier uniquely recognizes false fragments, which complements the previous DIA-based approach. Finally, DNMS2Purifier was demonstrated using a real experimental metabolomics study, showing improved MS/MS spectral quality and leading to an improved spectral match ratio and molecular networking outcome.


Asunto(s)
Metabolómica , Espectrometría de Masas en Tándem , Cromatografía Liquida , Análisis Espectral , Biología Computacional
5.
Anal Chem ; 93(4): 2669-2677, 2021 02 02.
Artículo en Inglés | MEDLINE | ID: mdl-33465307

RESUMEN

Existing data acquisition modes such as full-scan, data-dependent (DDA), and data-independent acquisition (DIA) often present limited capabilities in capturing metabolic information in liquid chromatography-mass spectrometry (LC-MS)-based metabolomics. In this work, we proposed a novel metabolomic data acquisition workflow that combines DDA and DIA analyses to achieve better metabolomic data quality, including enhanced metabolome coverage, tandem mass spectrometry (MS2) coverage, and MS2 quality. This workflow, named data-dependent-assisted data-independent acquisition (DaDIA), performs untargeted metabolomic analysis of individual biological samples using DIA mode and the pooled quality control (QC) samples using DDA mode. This combination takes advantage of the high-feature number and MS2 spectral coverage of the DIA data and the high MS2 spectral quality of the DDA data. To analyze the heterogeneous DDA and DIA data, we further developed a computational program, DaDIA.R, to automatically extract metabolic features and perform streamlined metabolite annotation of DaDIA data set. Using human urine samples, we demonstrated that the DaDIA workflow delivers remarkably improved data quality when compared to conventional DDA or DIA metabolomics. In particular, both the number of detected features and annotated metabolites were greatly increased. Further biological demonstration using a leukemia metabolomics study also proved that the DaDIA workflow can efficiently detect and annotate around 4 times more significant metabolites than DDA workflow with broad MS2 coverage and high MS2 spectral quality for downstream statistical analysis and biological interpretation. Overall, this work represents a critical development of data acquisition mode in untargeted metabolomics, which can greatly benefit untargeted metabolomics for a wide range of biological applications.


Asunto(s)
Exactitud de los Datos , Metabolómica/métodos , Programas Informáticos , Humanos , Leucemia/metabolismo , Metaboloma , Urinálisis , Flujo de Trabajo
6.
Anal Chem ; 93(29): 10243-10250, 2021 07 27.
Artículo en Inglés | MEDLINE | ID: mdl-34270210

RESUMEN

In-source fragmentation (ISF) is a naturally occurring phenomenon during electrospray ionization (ESI) in liquid chromatography-mass spectrometry (LC-MS) analysis. ISF leads to false metabolite annotation in untargeted metabolomics, prompting misinterpretation of the underlying biological mechanisms. Conventional metabolomic data cleaning mainly focuses on the annotation of adducts and isotopes, and the recognition of ISF features is mainly based on common neutral losses and the LC coelution pattern. In this work, we recognized three increasingly important patterns of ISF features, including (1) coeluting with their precursor ions, (2) being in the tandem MS (MS2) spectra of their precursor ions, and (3) sharing similar MS2 fragmentation patterns with their precursor ions. Based on these patterns, we developed an R package, ISFrag, to comprehensively recognize all possible ISF features from LC-MS data generated from full-scan, data-dependent acquisition, and data-independent acquisition modes without the assistance of common neutral loss information or MS2 spectral library. Tested using metabolite standards, we achieved a 100% correct recognition of level 1 ISF features and over 80% correct recognition for level 2 ISF features. Further application of ISFrag on untargeted metabolomics data allows us to identify ISF features that can potentially cause false metabolite annotation at an omics-scale. With the help of ISFrag, we performed a systematic investigation of how ISF features are influenced by different MS parameters, including capillary voltage, end plate offset, ion energy, and "collision energy". Our results show that while increasing energies can increase the number of real metabolic features and ISF features, the percentage of ISF features might not necessarily increase. Finally, using ISFrag, we created an ISF pathway to visualize the relationships between multiple ISF features that belong to the same precursor ion. ISFrag is freely available on GitHub (https://github.com/HuanLab/ISFrag).


Asunto(s)
Metabolómica , Espectrometría de Masas en Tándem , Cromatografía Liquida , Biblioteca de Genes , Iones
7.
Anal Chem ; 93(14): 5735-5743, 2021 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-33784068

RESUMEN

Despite the vast amount of metabolic information that can be captured in untargeted metabolomics, many biological applications are looking for a biology-driven metabolomics platform that targets a set of metabolites that are relevant to the given biological question. Steroids are a class of important molecules that play critical roles in many physiological systems and diseases. Besides known steroids, there are a large number of unknown steroids that have not been reported in the literature. The ability to rapidly detect and quantify both known and unknown steroid molecules in a biological sample can greatly accelerate a broad range of steroid-focused life science research. This work describes the development and application of SteroidXtract, a convolutional neural network (CNN)-based bioinformatics tool that can recognize steroid molecules in mass spectrometry (MS)-based untargeted metabolomics using their unique tandem MS (MS2) spectral patterns. SteroidXtract was trained using a comprehensive set of standard MS2 spectra from MassBank of North America (MoNA) and an in-house steroid library. Data augmentation strategies, including intensity thresholding and Gaussian noise addition, were created and applied to minimize data overfitting caused by the limited number of standard steroid MS2 spectra. The CNN model embedded in SteroidXtract was further compared with random forest and XGBoost using nested cross-validations to demonstrate its performance. Finally, SteroidXtract was applied in several metabolomics studies to demonstrate its sensitivity, specificity, and robustness. Compared to conventional statistics-driven metabolomics data interpretation, our work offers a novel automated biology-driven approach to interpreting untargeted metabolomics data, prioritizing biologically important molecules with high throughput and sensitivity.


Asunto(s)
Aprendizaje Profundo , Biología Computacional , Metabolómica , Esteroides , Espectrometría de Masas en Tándem
8.
Anal Chem ; 93(36): 12181-12186, 2021 09 14.
Artículo en Inglés | MEDLINE | ID: mdl-34455775

RESUMEN

Extracting metabolic features from liquid chromatography-mass spectrometry (LC-MS) data relies on the recognition of extracted ion chromatogram (EIC) peak shapes using peak picking algorithms. Unfortunately, all peak picking algorithms present a significant drawback of generating a problematic number of false positives. In this work, we take advantage of deep learning technology to develop a convolutional neural network (CNN)-based program that can automatically recognize metabolic features with poor EIC shapes, which are of low feature fidelity and more likely to be false. Our CNN model was trained using 25095 EIC plots collected from 22 LC-MS-based metabolomics projects of various sample types, LC and MS conditions. Notably, we manually inspected all the EIC plots to assign good or poor EIC quality for accurate model training. The trained CNN model is embedded into a C#-based program, named EVA (short for evaluation). The EVA Windows Application is a versatile platform that can process metabolic features generated by LC-MS systems of various vendors and processed using various data processing software. Our comprehensive evaluation of EVA indicates that it achieves over 90% classification accuracy. EVA can be readily used in LC-MS-based metabolomics projects and is freely available on the Microsoft Store by searching "EVA Metabolomics".


Asunto(s)
Aprendizaje Profundo , Algoritmos , Cromatografía Liquida , Espectrometría de Masas , Metabolómica
9.
Appl Environ Microbiol ; 87(5)2021 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-33355101

RESUMEN

Endospore formation is used by members of the phylum Firmicutes to withstand extreme environmental conditions. Several recent studies have proposed endospore formation in species outside of Firmicutes, particularly in Rhodobacter johrii and Serratia marcescens, members of the phylum Proteobacteria. Here, we aimed to investigate endospore formation in these two species by using advanced imaging and analytical approaches. Examination of the phase-bright structures observed in R. johrii and S. marcescens using cryo-electron tomography failed to identify endospores or stages of endospore formation. We determined that the phase-bright objects in R. johrii cells were triacylglycerol storage granules and those in S. marcescens were aggregates of cellular debris. In addition, R. johrii and S. marcescens containing phase-bright objects do not possess phenotypic and genetic features of endospores, including enhanced resistance to heat, presence of dipicolinic acid, or the presence of many of the genes associated with endospore formation. Our results support the hypothesis that endospore formation is restricted to the phylum Firmicutes.Importance: Bacterial endospore formation is an important process that allows the formation of dormant life forms called spores. As such, organisms able to sporulate can survive harsh environmental conditions for hundreds of years. Here, we follow up on previous claims that two members of Proteobacteria, Serratia marcescens and Rhodobacter johrii, are able to form spores. We conclude that those claims were incorrect and show that the putative spores in R. johrii and S. marcescens are storage granules and cellular debris, respectively. This study concludes that endospore formation is still unique to the phylum Firmicutes.

10.
Anal Chem ; 92(10): 7011-7019, 2020 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-32319750

RESUMEN

The nonlinear signal response of electrospray ionization (ESI) presents a critical limitation for mass spectrometry (MS)-based quantitative analysis. In the field of metabolomics research, this issue has largely remained unaddressed; MS signal intensities are usually directly used to calculate fold changes for quantitative comparison. In this work, we demonstrate that, due to the nonlinear ESI response, signal intensity ratios of a metabolic feature calculated between two samples may not reflect their real metabolic concentration ratios (i.e., fold-change compression), implying that conventional fold-change calculations directly using MS signal intensities can be misleading. In this regard, we developed a quality control (QC) sample-based signal calibration workflow to overcome the quantitative bias caused by the nonlinear ESI response. In this workflow, calibration curves for every metabolic feature are first established using a QC sample injected in serial injection volumes. The MS signals of each metabolic feature are then calibrated to their equivalent QC injection volumes for comparative analysis. We demonstrated this novel workflow in a targeted metabolite analysis, showing that the accuracy of fold-change calculations can be significantly improved. Furthermore, in a metabolomic comparison of the bone marrow interstitial fluid samples from leukemia patients before and after chemotherapy, an additional 59 significant metabolic features were found with fold changes larger than 1.5, and an additional 97 significant metabolic features had fold changes corrected by more than 0.1. This work enables high-quality quantitative analysis in untargeted metabolomics, thus providing more confident biological hypotheses generation.


Asunto(s)
Leucemia/diagnóstico , Leucemia/metabolismo , Metabolómica , Calibración , Humanos , Leucemia/sangre , Control de Calidad , Espectrometría de Masa por Ionización de Electrospray
11.
Anal Chem ; 92(21): 14476-14483, 2020 11 03.
Artículo en Inglés | MEDLINE | ID: mdl-33076659

RESUMEN

Spectral similarity comparison through tandem mass spectrometry (MS2) is a powerful approach to annotate known and unknown metabolic features in mass spectrometry (MS)-based untargeted metabolomics. In this work, we proposed the concept of hypothetical neutral loss (HNL), which is the mass difference between a pair of fragment ions in a MS2 spectrum. We demonstrated that HNL values contain core structural information that can be used to accurately assess the structural similarity between two MS2 spectra. We then developed the Core Structure-based Search (CSS) algorithm based on HNL values. CSS was validated with sets of hundreds of randomly selected metabolites and their reference MS2 spectra, showing significantly improved correlation between spectral and structural similarities. Compared to state-of-the-art spectral similarity algorithms, CSS generates better ranking of structurally relevant chemicals among false positives. Combining CSS, HNL library, and biotransformation database, we further developed Metabolite core structure-based Search (McSearch), a novel computational solution to facilitate the annotation of unknown metabolites using the reference MS2 spectra of their structural analogs. McSearch generates better results in the Critical Assessment of Small Molecule Identification (CASMI) 2017 data set than conventional unknown feature annotation programs. McSearch was also tested in experimental MS2 data of xenobiotic metabolite derivatives belonging to three different metabolic pathways. Our results confirmed that McSearch can better capture the underlying structural similarity between MS2 spectra. Overall, this work provides a novel direction for metabolite annotation via HNL values, paving the way for annotating metabolites using their structurally similar compounds.


Asunto(s)
Metabolómica/métodos , Espectrometría de Masas en Tándem/métodos , Algoritmos , Reacciones Falso Positivas
12.
ACS Infect Dis ; 10(1): 107-119, 2024 Jan 12.
Artículo en Inglés | MEDLINE | ID: mdl-38054469

RESUMEN

Cholesterol is a critical growth substrate for Mycobacterium tuberculosis (Mtb) during infection, and the cholesterol catabolic pathway has been targeted for the development of new antimycobacterial agents. A key metabolite in cholesterol catabolism is 3aα-H-4α(3'-propanoate)-7aß-methylhexahydro-1,5-indanedione (HIP). Many of the HIP metabolites are acyl-coenzyme A (CoA) thioesters, whose accumulation in deletion mutants can cause cholesterol-mediated toxicity. We used LC-MS/MS analysis to demonstrate that deletion of genes involved in HIP catabolism leads to acyl-CoA accumulation with concomitant depletion of free CoASH, leading to dysregulation of central metabolic pathways. CoASH and acyl-CoAs inhibited PanK, the enzyme that catalyzes the first step in the transformation of pantothenate to CoASH. Inhibition was competitive with respect to ATP with Kic values ranging from 9 µM for CoASH to 57 µM for small acyl-CoAs and 180 ± 30 µM for cholesterol-derived acyl-CoA. These findings link two critical metabolic pathways and suggest that therapeutics targeting cholesterol catabolic enzymes could both prevent the utilization of an important growth substrate and simultaneously sequester CoA from essential cellular processes, leading to bacterial toxicity.


Asunto(s)
Mycobacterium tuberculosis , Espectrometría de Masas en Tándem , Cromatografía Liquida , Colesterol/metabolismo , Coenzima A/metabolismo
13.
bioRxiv ; 2024 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-38798440

RESUMEN

Understanding the distribution of hundreds of thousands of plant metabolites across the plant kingdom presents a challenge. To address this, we curated publicly available LC-MS/MS data from 19,075 plant extracts and developed the plantMASST reference database encompassing 246 botanical families, 1,469 genera, and 2,793 species. This taxonomically focused database facilitates the exploration of plant-derived molecules using tandem mass spectrometry (MS/MS) spectra. This tool will aid in drug discovery, biosynthesis, (chemo)taxonomy, and the evolutionary ecology of herbivore interactions.

14.
Environ Health Perspect ; 131(3): 37009, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36913238

RESUMEN

BACKGROUND: Due to many substances in the human exposome, there is a dearth of exposure and toxicity information available to assess potential health risks. Quantification of all trace organics in the biological fluids seems impossible and costly, regardless of the high individual exposure variability. We hypothesized that the blood concentration (CB) of organic pollutants could be predicted via their exposure and chemical properties. Developing a prediction model on the annotation of chemicals in human blood can provide new insight into the distribution and extent of exposures to a wide range of chemicals in humans. OBJECTIVES: Our objective was to develop a machine learning (ML) model to predict blood concentrations (CBs) of chemicals and prioritize chemicals of health concern. METHODS: We curated the CBs of compounds mostly measured at population levels and developed an ML model for chemical CB predictions by considering chemical daily exposure (DE) and exposure pathway indicators (δij), half-lives (t1/2), and volume of distribution (Vd). Three ML models, including random forest (RF), artificial neural network (ANN) and support vector regression (SVR) were compared. The toxicity potential or prioritization of each chemical was represented as a bioanalytical equivalency (BEQ) and its percentage (BEQ%) estimated based on the predicted CB and ToxCast bioactivity data. We also retrieved the top 25 most active chemicals in each assay to further observe changes in the BEQ% after the exclusion of the drugs and endogenous substances. RESULTS: We curated the CBs of 216 compounds primarily measured at population levels. RF outperformed the ANN and SVF models with the root mean square error (RMSE) of 1.66 and 2.07µM, the mean absolute error (MAE) values of 1.28 and 1.56µM, the mean absolute percentage error (MAPE) of 0.29 and 0.23, and R2 of 0.80 and 0.72 across test and testing sets. Subsequently, the human CBs of 7,858 ToxCast chemicals were successfully predicted, ranging from 1.29×10-6 to 1.79×10-2 µM. The predicted CBs were then combined with ToxCast in vitro bioassays to prioritize the ToxCast chemicals across 12 in vitro assays with important toxicological end points. It is interesting that we found the most active compounds to be food additives and pesticides rather than widely monitored environmental pollutants. DISCUSSION: We have shown that the accurate prediction of "internal exposure" from "external exposure" is possible, and this result can be quite useful in the risk prioritization. https://doi.org/10.1289/EHP11305.


Asunto(s)
Contaminantes Ambientales , Exposoma , Plaguicidas , Humanos , Bosques Aleatorios , Contaminantes Ambientales/toxicidad , Plaguicidas/análisis
15.
Cell Rep ; 42(8): 112997, 2023 08 29.
Artículo en Inglés | MEDLINE | ID: mdl-37611587

RESUMEN

Colorectal cancer (CRC) is driven by genomic alterations in concert with dietary influences, with the gut microbiome implicated as an effector in disease development and progression. While meta-analyses have provided mechanistic insight into patients with CRC, study heterogeneity has limited causal associations. Using multi-omics studies on genetically controlled cohorts of mice, we identify diet as the major driver of microbial and metabolomic differences, with reductions in α diversity and widespread changes in cecal metabolites seen in high-fat diet (HFD)-fed mice. In addition, non-classic amino acid conjugation of the bile acid cholic acid (AA-CA) increased with HFD. We show that AA-CAs impact intestinal stem cell growth and demonstrate that Ileibacterium valens and Ruminococcus gnavus are able to synthesize these AA-CAs. This multi-omics dataset implicates diet-induced shifts in the microbiome and the metabolome in disease progression and has potential utility in future diagnostic and therapeutic developments.


Asunto(s)
Neoplasias Colorrectales , Microbioma Gastrointestinal , Microbiota , Animales , Ratones , Ácidos y Sales Biliares , Metaboloma
16.
Nat Commun ; 14(1): 8488, 2023 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-38123557

RESUMEN

Despite the increasing availability of tandem mass spectrometry (MS/MS) community spectral libraries for untargeted metabolomics over the past decade, the majority of acquired MS/MS spectra remain uninterpreted. To further aid in interpreting unannotated spectra, we created a nearest neighbor suspect spectral library, consisting of 87,916 annotated MS/MS spectra derived from hundreds of millions of MS/MS spectra originating from published untargeted metabolomics experiments. Entries in this library, or "suspects," were derived from unannotated spectra that could be linked in a molecular network to an annotated spectrum. Annotations were propagated to unknowns based on structural relationships to reference molecules using MS/MS-based spectrum alignment. We demonstrate the broad relevance of the nearest neighbor suspect spectral library through representative examples of propagation-based annotation of acylcarnitines, bacterial and plant natural products, and drug metabolism. Our results also highlight how the library can help to better understand an Alzheimer's brain phenotype. The nearest neighbor suspect spectral library is openly available for download or for data analysis through the GNPS platform to help investigators hypothesize candidate structures for unknown MS/MS spectra in untargeted metabolomics data.


Asunto(s)
Acceso a la Información , Espectrometría de Masas en Tándem , Espectrometría de Masas en Tándem/métodos , Metabolómica/métodos , Biblioteca de Genes , Análisis por Conglomerados
17.
Anal Chim Acta ; 1200: 339613, 2022 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-35256147

RESUMEN

Collision-induced dissociation (CID) is a common fragmentation strategy in tandem mass spectrometry (MS2) analysis. A conventional understanding is that fragment ions generated in low-energy CID should follow the even-electron rule. As such, (de)protonated ([M+H]+/[M-H]-) or even-electron precursor ions should follow heterolytic cleavages and predominately generate even-electron fragment ions with very few radical fragment ions (RFIs). However, the extent to which RFIs present in MS2 spectra has not been comprehensively investigated. This work uses the annotated high-resolution MS2 spectra from the latest NIST 20 tandem mass spectral library to investigate the occurrence of RFIs in CID MS2 experiments. In particular, RFIs were recognized using integer double bond equivalent (DBE) values calculated from their annotated molecular formulas. Our study shows that 65.4% and 68.8% of MS2 spectra of even-electron precursors contain at least 10% RFIs by ion-count (total number of ions) in positive and negative electrospray ionization modes, respectively. Furthermore, we classified chemicals based on their compound classes and chemical substructures, and calculated the percentages of RFIs in each class. As expected, compounds that can stabilize the radical site via resonance, such as aromatic and conjugated double bond-containing chemicals, are more likely to form RFIs. We also found four possible patterns of change in RFI percentages as a function of CID collision energy. Finally, we demonstrate that the inadequate consideration of RFIs in most conventional bioinformatic tools might be problematic during in silico fragmentation and de novo annotation of MS2 spectra. This work provides a further understanding of CID MS2 mechanisms, and the unexpectedly large percentage of RFIs suggests that the even-electron rule seems to be challenged in numerous cases where it is disobeyed.


Asunto(s)
Espectrometría de Masa por Ionización de Electrospray , Espectrometría de Masas en Tándem , Electrones , Iones , Espectrometría de Masa por Ionización de Electrospray/métodos , Espectrometría de Masas en Tándem/métodos
18.
Chem Commun (Camb) ; 58(72): 9979-9990, 2022 Sep 08.
Artículo en Inglés | MEDLINE | ID: mdl-35997016

RESUMEN

Advancements in computer science and software engineering have greatly facilitated mass spectrometry (MS)-based untargeted metabolomics. Nowadays, gigabytes of metabolomics data are routinely generated from MS platforms, containing condensed structural and quantitative information from thousands of metabolites. Manual data processing is almost impossible due to the large data size. Therefore, in the "omics" era, we are faced with new challenges, the big data challenges of how to accurately and efficiently process the raw data, extract the biological information, and visualize the results from the gigantic amount of collected data. Although important, proposing solutions to address these big data challenges requires broad interdisciplinary knowledge, which can be challenging for many metabolomics practitioners. Our laboratory in the Department of Chemistry at the University of British Columbia is committed to combining analytical chemistry, computer science, and statistics to develop bioinformatics tools that address these big data challenges. In this Feature Article, we elaborate on the major big data challenges in metabolomics, including data acquisition, feature extraction, quantitative measurements, statistical analysis, and metabolite annotation. We also introduce our recently developed bioinformatics solutions for these challenges. Notably, all of the bioinformatics tools and source codes are freely available on GitHub (https://www.github.com/HuanLab), along with revised and regularly updated content.


Asunto(s)
Macrodatos , Espectrometría de Masas en Tándem , Biología Computacional , Metabolómica/métodos , Programas Informáticos , Espectrometría de Masas en Tándem/métodos
19.
Nat Commun ; 13(1): 2510, 2022 05 06.
Artículo en Inglés | MEDLINE | ID: mdl-35523965

RESUMEN

Interrelating small molecules according to their aligned fragmentation spectra is central to tandem mass spectrometry-based untargeted metabolomics. Current alignment algorithms do not provide statistical significance and compounds that have multiple delocalized structural differences and therefore often fail to have their fragment ions aligned. Here we align fragmentation spectra with both statistical significance and allowance for multiple chemical differences using Significant Interrelation of MS/MS Ions via Laplacian Embedding (SIMILE). SIMILE yields spectral alignment inferred structural connections in molecular networks that are not found with cosine-based scoring algorithms. In addition, it is now possible to rank spectral alignments based on p-values in the exploration of structural relationships between compounds and enhance the chemical connectivity that can be obtained with molecular networking.


Asunto(s)
Metabolómica , Espectrometría de Masas en Tándem , Algoritmos , Iones
20.
Metabolites ; 12(3)2022 Feb 26.
Artículo en Inglés | MEDLINE | ID: mdl-35323655

RESUMEN

Extracting metabolic features from liquid chromatography-mass spectrometry (LC-MS) data has been a long-standing bioinformatic challenge in untargeted metabolomics. Conventional feature extraction algorithms fail to recognize features with low signal intensities, poor chromatographic peak shapes, or those that do not fit the parameter settings. This problem also poses a challenge for MS-based exposome studies, as low-abundant metabolic or exposomic features cannot be automatically recognized from raw data. To address this data processing challenge, we developed an R package, JPA (short for Joint Metabolomic Data Processing and Annotation), to comprehensively extract metabolic features from raw LC-MS data. JPA performs feature extraction by combining a conventional peak picking algorithm and strategies for (1) recognizing features with bad peak shapes but that have tandem mass spectra (MS2) and (2) picking up features from a user-defined targeted list. The performance of JPA in global metabolomics was demonstrated using serial diluted urine samples, in which JPA was able to rescue an average of 25% of metabolic features that were missed by the conventional peak picking algorithm due to dilution. More importantly, the chromatographic peak shapes, analytical accuracy, and precision of the rescued metabolic features were all evaluated. Furthermore, owing to its sensitive feature extraction, JPA was able to achieve a limit of detection (LOD) that was up to thousands of folds lower when automatically processing metabolomics data of a serial diluted metabolite standard mixture analyzed in HILIC(-) and RP(+) modes. Finally, the performance of JPA in exposome research was validated using a mixture of 250 drugs and 255 pesticides at environmentally relevant levels. JPA detected an average of 2.3-fold more exposure compounds than conventional peak picking only.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA