Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 114
Filtrar
1.
Cell ; 187(7): 1801-1818.e20, 2024 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-38471500

RESUMO

The repertoire of modifications to bile acids and related steroidal lipids by host and microbial metabolism remains incompletely characterized. To address this knowledge gap, we created a reusable resource of tandem mass spectrometry (MS/MS) spectra by filtering 1.2 billion publicly available MS/MS spectra for bile-acid-selective ion patterns. Thousands of modifications are distributed throughout animal and human bodies as well as microbial cultures. We employed this MS/MS library to identify polyamine bile amidates, prevalent in carnivores. They are present in humans, and their levels alter with a diet change from a Mediterranean to a typical American diet. This work highlights the existence of many more bile acid modifications than previously recognized and the value of leveraging public large-scale untargeted metabolomics data to discover metabolites. The availability of a modification-centric bile acid MS/MS library will inform future studies investigating bile acid roles in health and disease.


Assuntos
Ácidos e Sais Biliares , Microbioma Gastrointestinal , Metabolômica , Espectrometria de Massas em Tandem , Animais , Humanos , Ácidos e Sais Biliares/química , Metabolômica/métodos , Poliaminas , Espectrometria de Massas em Tandem/métodos , Bases de Dados de Compostos Químicos
2.
Nat Methods ; 20(6): 881-890, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37055660

RESUMO

A substantial fraction of metabolic features remains undetermined in mass spectrometry (MS)-based metabolomics, and molecular formula annotation is the starting point for unraveling their chemical identities. Here we present bottom-up tandem MS (MS/MS) interrogation, a method for de novo formula annotation. Our approach prioritizes MS/MS-explainable formula candidates, implements machine-learned ranking and offers false discovery rate estimation. Compared with the mathematically exhaustive formula enumeration, our approach shrinks the formula candidate space by 42.8% on average. Method benchmarking on annotation accuracy was systematically carried out on reference MS/MS libraries and real metabolomics datasets. Applied on 155,321 recurrent unidentified spectra, our approach confidently annotated >5,000 novel molecular formulae absent from chemical databases. Beyond the level of individual metabolic features, we combined bottom-up MS/MS interrogation with global optimization to refine formula annotations while revealing peak interrelationships. This approach allowed the systematic annotation of 37 fatty acid amide molecules in human fecal data. All bioinformatics pipelines are available in a standalone software, BUDDY ( https://github.com/HuanLab/BUDDY ).


Assuntos
Software , Espectrometria de Massas em Tandem , Humanos , Espectrometria de Massas em Tandem/métodos , Metabolômica/métodos , Biologia Computacional , Bases de Dados de Compostos Químicos
3.
Proc Natl Acad Sci U S A ; 120(18): e2303275120, 2023 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-37094164

RESUMO

The presence of a cell membrane is one of the major structural components defining life. Recent phylogenomic analyses have supported the hypothesis that the last universal common ancestor (LUCA) was likely a diderm. Yet, the mechanisms that guided outer membrane (OM) biogenesis remain unknown. Thermotogae is an early-branching phylum with a unique OM, the toga. Here, we use cryo-electron tomography to characterize the in situ cell envelope architecture of Thermotoga maritima and show that the toga is made of extended sheaths of ß-barrel trimers supporting small (~200 nm) membrane patches. Lipidomic analyses identified the same major lipid species in the inner membrane (IM) and toga, including the rare to bacteria membrane-spanning ether-bound diabolic acids (DAs). Proteomic analyses revealed that the toga was composed of multiple SLH-domain containing Ompα and novel ß-barrel proteins, and homology searches detected variable conservations of these proteins across the phylum. These results highlight that, in contrast to the SlpA/OmpM superfamily of proteins, Thermotoga possess a highly diverse bipartite OM-tethering system. We discuss the implications of our findings with respect to other early-branching phyla and propose that a toga-like intermediate may have facilitated monoderm-to-diderm cell envelope transitions.


Assuntos
Bactérias , Proteômica , Membrana Celular , Parede Celular , Filogenia , Proteínas da Membrana Bacteriana Externa/genética
4.
Mol Cell Proteomics ; 22(6): 100559, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37105363

RESUMO

The 2nd CASMS conference was held virtually through Gather. Town platform from October 17 to 21, 2022, with a total of 363 registrants including an outstanding and diverse group of scientists at the forefront of their research fields from both academia and industry worldwide, especially in the United States and China. The conference offered a 5-day agenda with an exciting scientific program consisting of two plenary lectures, 14 parallel symposia, and 4 special sessions in which a total of 97 invited speakers presented technological innovations and their applications in proteomics & biological mass spectrometry and metabo-lipidomics & pharmaceutical mass spectrometry. In addition, 18 invited speakers/panelists presented at 3 research-focused and 2 career development workshops. Moreover, 144 posters, 54 lightning talks, 5 sponsored workshops, and 14 exhibitions were presented, from which 20 posters and 8 lightning talks received presentation awards. Furthermore, the conference featured 1 MCP lectureship and 5 young investigator awardees for the first time to highlight outstanding mid-career and early-career rising stars in mass spectrometry from our society. The conference provided a unique scientific platform for young scientists (i.e., graduate students, postdocs and junior faculty/investigators) to present their research, meet with prominent scientists, and learn about career development and job opportunities (http://casms.org).


Assuntos
Espectrometria de Massas , Sociedades Científicas , Humanos , China , Preparações Farmacêuticas , Proteômica , Estados Unidos
5.
Anal Chem ; 96(9): 3727-3732, 2024 03 05.
Artigo em Inglês | MEDLINE | ID: mdl-38395621

RESUMO

Processing liquid chromatography-mass spectrometry-based metabolomics data using computational programs often introduces additional quantitative uncertainty, termed computational variation in a previous work. This work develops a computational solution to automatically recognize metabolic features with computational variation in a metabolomics data set. This tool, AVIR (short for "Accurate eValuation of alIgnment and integRation"), is a support vector machine-based machine learning strategy (https://github.com/HuanLab/AVIR). The rationale is that metabolic features with computational variation have a poor correlation between chromatographic peak area and peak height-based quantifications across the samples in a study. AVIR was trained on a set of 696 manually curated metabolic features and achieved an accuracy of 94% in a 10-fold cross-validation. When tested on various external data sets from public metabolomics repositories, AVIR demonstrated an accuracy range of 84%-97%. Finally, tested on a large-scale metabolomics study, AVIR clearly indicated features with computational variation and thus guided us to manually correct them. Our results show that 75.3% of the samples with computational variation had a relative intensity difference of over 20% after correction. This demonstrates the critical role of AVIR in reducing computational variation to improve quantitative certainty in untargeted metabolomics analysis.


Assuntos
Metabolômica , Software , Incerteza , Metabolômica/métodos , Cromatografia Líquida/métodos , Espectrometria de Massa com Cromatografia Líquida
6.
Anal Chem ; 96(6): 2590-2598, 2024 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-38294426

RESUMO

High-resolution mass spectrometry (HRMS) is a prominent analytical tool that characterizes chlorinated disinfection byproducts (Cl-DBPs) in an unbiased manner. Due to the diversity of chemicals, complex background signals, and the inherent analytical fluctuations of HRMS, conventional isotopic pattern (37Cl/35Cl), mass defect, and direct molecular formula (MF) prediction are insufficient for accurate recognition of the diverse Cl-DBPs in real environmental samples. This work proposes a novel strategy to recognize Cl-containing chemicals based on machine learning. Our hierarchical machine learning framework has two random forest-based models: the first layer is a binary classifier to recognize Cl-containing chemicals, and the second layer is a multiclass classifier to annotate the number of Cl present. This model was trained using ∼1.4 million distinctive MFs from PubChem. Evaluated on over 14,000 unique MFs from NIST20, this machine learning model achieved 93.3% accuracy in recognizing Cl-containing MFs (Cl-MFs) and 92.9% accuracy in annotating the number of Cl for Cl-MFs. Furthermore, the trained model was integrated into ChloroDBPFinder, a standalone R package for the streamlined processing of LC-HRMS data and annotating both known and unknown Cl-containing compounds. Tested on existing Cl-DBP data sets related to aspartame chlorination in tap water, our ChloroDBPFinder efficiently extracted 159 Cl-containing DBP features and tentatively annotated the structures of 10 Cl-DBPs via molecular networking. In another application of a chlorinated humic substance, ChloroDBPFinder extracted 79 high-quality Cl-DBPs and tentatively annotated six compounds. In summary, our proposed machine learning strategy and the developed ChloroDBPFinder provide an advanced solution to identifying Cl-containing compounds in nontargeted analysis of water samples. It is freely available on GitHub (https://github.com/HuanLab/ChloroDBPFinder).

7.
Environ Sci Technol ; 58(35): 15807-15815, 2024 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-39163399

RESUMO

Concerns over toxic nitrogenous disinfection byproducts (N-DBPs) necessitate identifying their precursors in source water. Natural organic amino compounds are known precursors to N-DBPs. Three Suwannee River (SR) standard reference materials (SRMs), humic acids (HA), fulvic acids (FA), and natural organic matter (NOM), are commonly used to study DBP formation, but the chemical makeup of amino compounds in SRSRMs remains largely unknown. To address this, we combined stable hydrogen/deuterium isotope labeling, HDPairFinder bioinformatics, and nontargeted high-performance liquid chromatography-high-resolution mass spectrometry (HPLC-HRMS) to characterize these compounds in SRSRMs. This method classifies reactive amines, provides accurate masses and MS/MS spectra, and quantifies intensities. We identified 2707 high-quality features with primary and/or secondary amines in SRSRMs and 75% of them having an m/z < 300. Across all three SRSRMs, 327 amino features were detected, while 856, 794, and 200 unique features were found in SRNOM, SRHA, and SRFA, respectively. In North Saskatchewan River (NSR) samples, a total of 6449 amino features were detected, 818 of them matched those in SRSRMs, and 87% of them were different between the two rivers. Using chemical standards, we confirmed 10 compounds and tentatively identified 5 more. This study highlights similarities and differences in reactive N-precursors in SRSRMs and local river water, enhancing the understanding of geo-differences in reactive N-precursors in different source waters.


Assuntos
Rios , Rios/química , Poluentes Químicos da Água/análise , Compostos de Nitrogênio/análise , Desinfecção , Benzopiranos
8.
Biomed Chromatogr ; 38(3): e5795, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38071756

RESUMO

Following the highly successful Chinese American Society for Mass Spectrometry (CASMS) conferences in the previous 2 years, the 3rd CASMS Conference was held virtually on August 28-31, 2023, using the Gather.Town platform to bring together scientists in the MS field. The conference offered a 4-day agenda with a scientific program consisting of two plenary lectures, and 14 parallel symposia in which a total of 70 speakers presented technological innovations and their applications in proteomics and biological MS and metabo-lipidomics and pharmaceutical MS. In addition, 16 invited speakers/panelists presented at two research-focused and three career development workshops. Moreover, 86 posters, 12 lightning talks, 3 sponsored workshops, and 11 exhibitions were presented, from which 9 poster awards and 2 lightning talk awards were selected. Furthermore, the conference featured four young investigator awardees to highlight early-career achievements in MS from our society. The conference provided a unique scientific platform for young scientists (i.e. graduate students, postdocs, and junior faculty/investigators) to present their research, meet with prominent scientists, learn about career development, and job opportunities (http://casms.org).


Assuntos
Espectrometria de Massas , Lipidômica , Preparações Farmacêuticas , Proteômica , Congressos como Assunto
9.
Anal Chem ; 95(14): 5894-5902, 2023 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-36972195

RESUMO

Inconsistent peak picking outcomes are a critical concern in processing liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics data. This work systematically studied the mechanisms behind the discrepancies among five commonly used peak picking algorithms, including CentWave in XCMS, linear-weighted moving average in MS-DIAL, automated data analysis pipeline (ADAP) in MZmine 2, Savitzky-Golay in El-MAVEN, and FeatureFinderMetabo in OpenMS. We first collected 10 public metabolomics datasets representing various LC-MS analytical conditions. We then incorporated several novel strategies to (i) acquire the optimal peak picking parameters of each algorithm for a fair comparison, (ii) automatically recognize false metabolic features with poor chromatographic peak shapes, and (iii) evaluate the real metabolic features that are missed by the algorithms. By applying these strategies, we compared the true, false, and undetected metabolic features in each data processing outcome. Our results show that linear-weighted moving average consistently outperforms the other peak picking algorithms. To facilitate a mechanistic understanding of the differences, we proposed six peak attributes: ideal slope, sharpness, peak height, mass deviation, peak width, and scan number. We also developed an R program to automatically measure these attributes for detected and undetected true metabolic features. From the results of the 10 datasets, we concluded that four peak attributes, including ideal slope, scan number, peak width, and mass deviation, are critical for the detectability of a peak. For instance, the focus on ideal slope critically hinders the extraction of true metabolic features with low ideal slope scores in linear-weighted moving average, Savitzky-Golay, and ADAP. The relationships between peak picking algorithms and peak attributes were also visualized in a principal component analysis biplot. Overall, the clear comparison and explanation of the differences between peak picking algorithms can lead to the design of better peak picking strategies in the future.


Assuntos
Algoritmos , Software , Espectrometria de Massas/métodos , Cromatografia Líquida/métodos , Metabolômica/métodos
10.
Anal Chem ; 95(35): 13018-13028, 2023 09 05.
Artigo em Inglês | MEDLINE | ID: mdl-37603462

RESUMO

The purity of tandem mass spectrometry (MS/MS) is essential to MS/MS-based metabolite annotation and unknown exploration. This work presents a de novo approach to cleaning chimeric MS/MS spectra generated in liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based metabolomics. The assumption is that true fragments and their precursors are well correlated across the samples in a study, while false or contamination fragments are rather independent. Using data simulation, this work starts with an investigation of the negative effects of chimeric MS/MS spectra on spectral similarity analysis and molecular networking. Next, the characteristics of true and false fragments in chimeric MS/MS spectra were investigated using MS/MS of chemical standards. We recognized three fragment peak attributes indicative of whether a peak is a false fragment, including (1) intensity ratio fluctuation, (2) appearance rate, and (3) relative intensity. Using these attributes, we tested three machine learning models and identified XGBoost as the best model to achieve an area under the precision-recall curve of 0.98 for a clear separation between true and false fragments. Based on the trained model, we constructed an automated bioinformatic platform, DNMS2Purifier (short for de novo MS2Purifier), for metabolic features from metabolomics studies. DNMS2Purifier recognizes and processes chimeric MS/MS spectra without additional sample analysis or library confirmation. DNMS2Purifer was evaluated on a metabolomics data set generated with different MS/MS precursor isolation windows. It successfully captured the increase in the number of false fragments from the increased isolation window. DNMS2Purifier was also compared to MS2Purifier, an existing MS/MS spectral cleaning tool based on the addition of data-independent acquisition (DIA) analysis. Results indicated that DNMS2Purifier uniquely recognizes false fragments, which complements the previous DIA-based approach. Finally, DNMS2Purifier was demonstrated using a real experimental metabolomics study, showing improved MS/MS spectral quality and leading to an improved spectral match ratio and molecular networking outcome.


Assuntos
Metabolômica , Espectrometria de Massas em Tandem , Cromatografia Líquida , Análise Espectral , Biologia Computacional
11.
Bioinformatics ; 38(13): 3429-3437, 2022 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-35639662

RESUMO

MOTIVATION: Post-acquisition sample normalization is a critical step in comparative metabolomics to remove the variation introduced by sample amount or concentration difference. Previously reported approaches are either specific to one sample type or built on strong assumptions on data structure, which are limited to certain levels. This encouraged us to develop MAFFIN, an accurate and robust post-acquisition sample normalization workflow that works universally for metabolomics data collected on mass spectrometry (MS) platforms. RESULTS: MAFFIN calculates normalization factors using maximal density fold change (MDFC) computed by a kernel density-based approach. Using both simulated data and 20 metabolomics datasets, we showcased that MDFC outperforms four commonly used normalization methods in terms of reducing the intragroup variation among samples. Two essential steps, overlooked in conventional methods, were also examined and incorporated into MAFFIN. (i) MAFFIN uses multiple orthogonal criteria to select high-quality features for normalization factor calculation, which minimizes the bias caused by abiotic features or metabolites with poor quantitative performance. (ii) MAFFIN corrects the MS signal intensities of high-quality features using serial quality control samples, which guarantees the accuracy of fold change calculations. MAFFIN was applied to a human saliva metabolomics study and led to better data separation in principal component analysis and more confirmed significantly altered metabolites. AVAILABILITY AND IMPLEMENTATION: The MAFFIN algorithm was implemented in an R package named MAFFIN. Package installation, user instruction and demo data are available at https://github.com/HuanLab/MAFFIN. Other data in this work are available on request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Metabolômica , Humanos , Metabolômica/métodos , Espectrometria de Massas/métodos , Fluxo de Trabalho , Software
12.
Molecules ; 28(8)2023 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-37110719

RESUMO

The unambiguous identification of lipids is a critical component of lipidomics studies and greatly impacts the interpretation and significance of analyses as well as the ultimate biological understandings derived from measurements. The level of structural detail that is available for lipid identifications is largely determined by the analytical platform being used. Mass spectrometry (MS) coupled with liquid chromatography (LC) is the predominant combination of analytical techniques used for lipidomics studies, and these methods can provide fairly detailed lipid identification. More recently, ion mobility spectrometry (IMS) has begun to see greater adoption in lipidomics studies thanks to the additional dimension of separation that it provides and the added structural information that can support lipid identification. At present, relatively few software tools are available for IMS-MS lipidomics data analysis, which reflects the still limited adoption of IMS as well as the limited software support. This fact is even more pronounced for isomer identifications, such as the determination of double bond positions or integration with MS-based imaging. In this review, we survey the landscape of software tools that are available for the analysis of IMS-MS-based lipidomics data and we evaluate lipid identifications produced by these tools using open-access data sourced from the peer-reviewed lipidomics literature.


Assuntos
Espectrometria de Mobilidade Iônica , Lipidômica , Lipidômica/métodos , Lipídeos/análise , Espectrometria de Massas/métodos , Software
13.
Anal Chem ; 94(23): 8267-8276, 2022 06 14.
Artigo em Inglês | MEDLINE | ID: mdl-35657711

RESUMO

Metabolomic data normality is vital for many statistical analyses to identify significantly different metabolic features. However, despite the thousands of metabolomic publications every year, the study of metabolomic data distribution is rare. Using large-scale metabolomic data sets, we performed a comprehensive study of metabolomic data distributions. We showcased that metabolic features have diverse data distribution types, and the majority of them cannot be normalized correctly using conventional data transformation algorithms, including log and square root transformations. To understand the various non-normal data distributions, we proposed fitting metabolomic data into nine beta distributions, each representing a unique data distribution. The results of three large-scale data sets consistently show that two low normality types are very common. Next, we created the adaptive Box-Cox (ABC) transformation, a novel feature-specific data transformation approach for improving data normality. By tuning a power parameter based on a normality test result, ABC transformation was made to work for various data distribution types, and it showed great performance in normalizing skewed metabolomic data. Tested on a series of simulated data in Monte Carlo simulations, ABC transformation outperformed conventional data transformation approaches for both positively and negatively skewed data distributions. ABC transformation was further demonstrated in a real metabolomic study composed of three pairwise comparisons. Additional 84, 44, and 57 significant metabolites were newly confirmed after ABC transformation, corresponding to respective increases of 70.6, 13.4, and 22.9% in significant metabolites compared to the conventional metabolomic workflow. Some of these newly discovered metabolites showed promising biological meanings. ABC transformation was implemented in the R package ABCstats and is freely available on GitHub (https://github.com/HuanLab/ABCstats).


Assuntos
Algoritmos , Projetos de Pesquisa , Método de Monte Carlo
14.
Anal Chem ; 94(10): 4260-4268, 2022 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-35245044

RESUMO

Choosing appropriate data processing parameters is critical in processing liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics data. The conventional design of experiments (DOE) approach is time-consuming and provides no intuitive explanation why the selected parameters generate the best results. After studying commonly used metabolomics data processing software, this work summarized a set of universal parameters, including mass tolerance, peak height, peak width, and instrumental shift. These universal parameters are shared among different feature extraction programs and are critical to metabolic feature extraction. We then developed Paramounter, an R program that automatically measures these universal parameters from raw LC-MS-based metabolomics data prior to metabolic feature extraction. This is made possible through novel concepts of rank-based intensity sorting, zone of interest, and many others. Paramounter also translates universal parameters to software-specific parameters for data processing in different programs. Applying Paramounter is demonstrated to provide a threefold increase in the extracted metabolites compared to using default parameters in MS-DIAL-based feature extraction. Furthermore, the comparison between Paramounter, AutoTuner, and IPO showed that Paramounter generates 3.7- and 1.6-fold more true positive features than AutoTuner and IPO, respectively. Further validation of Paramounter on 11 datasets covering different sample types, data acquisition modes, and MS vendors proved that Paramounter is a convenient and robust program. Overall, the proposed universal parameters and the development of Paramounter address a critical need in metabolomics data processing, transforming metabolomics feature extraction from a "black box" to a "white box." Paramounter is freely available on GitHub (https://github.com/HuanLab/Paramounter).


Assuntos
Metabolômica , Software , Cromatografia Líquida/métodos , Espectrometria de Massas/métodos , Metabolômica/métodos
15.
Nat Chem Biol ; 16(1): 42-49, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31636431

RESUMO

Modular nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) enzymatic assembly lines are large and dynamic protein machines that generally effect a linear sequence of catalytic cycles. Here, we report the heterologous reconstitution and comprehensive characterization of two hybrid NRPS-PKS assembly lines that defy many standard rules of assembly line biosynthesis to generate a large combinatorial library of cyclic lipodepsipeptide protease inhibitors called thalassospiramides. We generate a series of precise domain-inactivating mutations in thalassospiramide assembly lines, and present evidence for an unprecedented biosynthetic model that invokes intermodule substrate activation and tailoring, module skipping and pass-back chain extension, whereby the ability to pass the growing chain back to a preceding module is flexible and substrate driven. Expanding bidirectional intermodule domain interactions could represent a viable mechanism for generating chemical diversity without increasing the size of biosynthetic assembly lines and challenges our understanding of the potential elasticity of multimodular megaenzymes.


Assuntos
Família Multigênica , Peptídeo Sintases/metabolismo , Peptídeos Cíclicos/biossíntese , Catálise , Cromatografia Líquida , Clonagem Molecular , Elasticidade , Deleção de Genes , Teste de Complementação Genética , Espectrometria de Massas , Mutação , Policetídeo Sintases/metabolismo , Domínios Proteicos , Proteobactérias/enzimologia , Especificidade por Substrato
16.
Anal Chem ; 93(4): 2254-2262, 2021 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-33400486

RESUMO

Despite the well-known nonlinear response of electrospray ionization (ESI) in mass spectrometry (MS)-based analysis, its complicated response patterns and negative impact on quantitative comparison are still understudied. We showcase in this work that the patterns of nonlinear ESI response are feature-dependent and can cause significant compression or inflation to signal ratios. In particular, our metabolomics study of serial diluted human urine samples showed that over 72% and 16% metabolic features suffered ratio compression and inflation, respectively, whereas only 12% of the signal ratios represent real metabolic concentration ratios. More importantly, these ratio compression and inflation largely exist in the linear response ranges, suggesting that it cannot be resolved by simply diluting the sample solutions to the linear ESI response ranges. Furthermore, we demonstrated that a polynomial regression model that converts MS signals to sample injection amounts can correct the biased ratios and, surprisingly, outperform the linear regression model in both data fitting and data prediction. Therefore, we proposed a metabolic ratio correction (MRC) strategy to minimize signal ratio bias in untargeted metabolomics for accurate quantitative comparison. In brief, by using the data of serial diluted quality control (QC) samples, we applied a cross-validation strategy to determine the best regression model, between linear and polynomial, for each metabolic feature and to convert the measured MS intensities to QC injection amounts for accurate metabolic ratio calculation. Both the studies of human urine samples and a metabolomics application supported that our MRC approach is very efficient in correcting the biased signal ratios. This novel insight of patterned ESI nonlinear response and MRC workflow can significantly benefit the downstream statistical comparison and biological interpretation for untargeted metabolomics.


Assuntos
Metabolômica/métodos , Espectrometria de Massas por Ionização por Electrospray/métodos , Urina/química , Urinálise
17.
Anal Chem ; 2021 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-34132520

RESUMO

Computational tools are commonly used in untargeted metabolomics to automatically extract metabolic features from liquid chromatography-mass spectrometry (LC-MS) raw data. However, due to the incapability of software to accurately determine chromatographic peak heights/areas for features with poor chromatographic peak shape, automated data processing in untargeted metabolomics faces additional quantitative variation (i.e., computational variation) besides the well-recognized analytical and biological variations. In this work, using multiple biological samples, we investigated how experimental factors, including sample concentrations, LC separation columns, and data processing programs, contribute to computational variation. For example, we found that the peak height (PH)-based quantification is more precise when MS-DIAL was used for data processing. We further systematically compared the different patterns of computational variation between PH- and peak area (PA)-based quantitative measurements. Our results suggest that the magnitude of computational variation is highly consistent at a given concentration. Hence, we proposed a quality control (QC) sample-based correction workflow to minimize computational variation by automatically selecting PH or PA-based measurement for each intensity value. This bioinformatic solution was demonstrated in a metabolomic comparison of leukemia patients before and after chemotherapy. Our novel workflow can be effectively applied on 652 out of 915 metabolic features, and over 31% (206 out of 652) of corrected features showed distinctly changed statistical significance. Overall, this work highlights computational variation, a considerable but underinvestigated quantitative variability in omics-scale quantitative analyses. In addition, the proposed bioinformatic solution can minimize computational variation, thus providing a more confident statistical comparison among biological groups in quantitative metabolomics.

18.
Anal Chem ; 93(4): 2669-2677, 2021 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-33465307

RESUMO

Existing data acquisition modes such as full-scan, data-dependent (DDA), and data-independent acquisition (DIA) often present limited capabilities in capturing metabolic information in liquid chromatography-mass spectrometry (LC-MS)-based metabolomics. In this work, we proposed a novel metabolomic data acquisition workflow that combines DDA and DIA analyses to achieve better metabolomic data quality, including enhanced metabolome coverage, tandem mass spectrometry (MS2) coverage, and MS2 quality. This workflow, named data-dependent-assisted data-independent acquisition (DaDIA), performs untargeted metabolomic analysis of individual biological samples using DIA mode and the pooled quality control (QC) samples using DDA mode. This combination takes advantage of the high-feature number and MS2 spectral coverage of the DIA data and the high MS2 spectral quality of the DDA data. To analyze the heterogeneous DDA and DIA data, we further developed a computational program, DaDIA.R, to automatically extract metabolic features and perform streamlined metabolite annotation of DaDIA data set. Using human urine samples, we demonstrated that the DaDIA workflow delivers remarkably improved data quality when compared to conventional DDA or DIA metabolomics. In particular, both the number of detected features and annotated metabolites were greatly increased. Further biological demonstration using a leukemia metabolomics study also proved that the DaDIA workflow can efficiently detect and annotate around 4 times more significant metabolites than DDA workflow with broad MS2 coverage and high MS2 spectral quality for downstream statistical analysis and biological interpretation. Overall, this work represents a critical development of data acquisition mode in untargeted metabolomics, which can greatly benefit untargeted metabolomics for a wide range of biological applications.


Assuntos
Confiabilidade dos Dados , Metabolômica/métodos , Software , Humanos , Leucemia/metabolismo , Metaboloma , Urinálise , Fluxo de Trabalho
19.
Anal Chem ; 93(14): 5735-5743, 2021 04 13.
Artigo em Inglês | MEDLINE | ID: mdl-33784068

RESUMO

Despite the vast amount of metabolic information that can be captured in untargeted metabolomics, many biological applications are looking for a biology-driven metabolomics platform that targets a set of metabolites that are relevant to the given biological question. Steroids are a class of important molecules that play critical roles in many physiological systems and diseases. Besides known steroids, there are a large number of unknown steroids that have not been reported in the literature. The ability to rapidly detect and quantify both known and unknown steroid molecules in a biological sample can greatly accelerate a broad range of steroid-focused life science research. This work describes the development and application of SteroidXtract, a convolutional neural network (CNN)-based bioinformatics tool that can recognize steroid molecules in mass spectrometry (MS)-based untargeted metabolomics using their unique tandem MS (MS2) spectral patterns. SteroidXtract was trained using a comprehensive set of standard MS2 spectra from MassBank of North America (MoNA) and an in-house steroid library. Data augmentation strategies, including intensity thresholding and Gaussian noise addition, were created and applied to minimize data overfitting caused by the limited number of standard steroid MS2 spectra. The CNN model embedded in SteroidXtract was further compared with random forest and XGBoost using nested cross-validations to demonstrate its performance. Finally, SteroidXtract was applied in several metabolomics studies to demonstrate its sensitivity, specificity, and robustness. Compared to conventional statistics-driven metabolomics data interpretation, our work offers a novel automated biology-driven approach to interpreting untargeted metabolomics data, prioritizing biologically important molecules with high throughput and sensitivity.


Assuntos
Aprendizado Profundo , Biologia Computacional , Metabolômica , Esteroides , Espectrometria de Massas em Tandem
20.
Anal Chem ; 93(29): 10243-10250, 2021 07 27.
Artigo em Inglês | MEDLINE | ID: mdl-34270210

RESUMO

In-source fragmentation (ISF) is a naturally occurring phenomenon during electrospray ionization (ESI) in liquid chromatography-mass spectrometry (LC-MS) analysis. ISF leads to false metabolite annotation in untargeted metabolomics, prompting misinterpretation of the underlying biological mechanisms. Conventional metabolomic data cleaning mainly focuses on the annotation of adducts and isotopes, and the recognition of ISF features is mainly based on common neutral losses and the LC coelution pattern. In this work, we recognized three increasingly important patterns of ISF features, including (1) coeluting with their precursor ions, (2) being in the tandem MS (MS2) spectra of their precursor ions, and (3) sharing similar MS2 fragmentation patterns with their precursor ions. Based on these patterns, we developed an R package, ISFrag, to comprehensively recognize all possible ISF features from LC-MS data generated from full-scan, data-dependent acquisition, and data-independent acquisition modes without the assistance of common neutral loss information or MS2 spectral library. Tested using metabolite standards, we achieved a 100% correct recognition of level 1 ISF features and over 80% correct recognition for level 2 ISF features. Further application of ISFrag on untargeted metabolomics data allows us to identify ISF features that can potentially cause false metabolite annotation at an omics-scale. With the help of ISFrag, we performed a systematic investigation of how ISF features are influenced by different MS parameters, including capillary voltage, end plate offset, ion energy, and "collision energy". Our results show that while increasing energies can increase the number of real metabolic features and ISF features, the percentage of ISF features might not necessarily increase. Finally, using ISFrag, we created an ISF pathway to visualize the relationships between multiple ISF features that belong to the same precursor ion. ISFrag is freely available on GitHub (https://github.com/HuanLab/ISFrag).


Assuntos
Metabolômica , Espectrometria de Massas em Tandem , Cromatografia Líquida , Biblioteca Gênica , Íons
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa