Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 26
Filter
1.
J Cheminform ; 16(1): 8, 2024 Jan 18.
Article in English | MEDLINE | ID: mdl-38238779

ABSTRACT

The majority of tandem mass spectrometry (MS/MS) spectra in untargeted metabolomics and exposomics studies lack any annotation. Our deep learning framework, Integrated Data Science Laboratory for Metabolomics and Exposomics-Mass INTerpreter (IDSL_MINT) can translate MS/MS spectra into molecular fingerprint descriptors. IDSL_MINT allows users to leverage the power of the transformer model for mass spectrometry data, similar to the large language models. Models are trained on user-provided reference MS/MS libraries via any customizable molecular fingerprint descriptors. IDSL_MINT was benchmarked using the LipidMaps database and improved the annotation rate of a test study for MS/MS spectra that were not originally annotated using existing mass spectral libraries. IDSL_MINT may improve the overall annotation rates in untargeted metabolomics and exposomics studies. The IDSL_MINT framework and tutorials are available in the GitHub repository at https://github.com/idslme/IDSL_MINT .Scientific contribution statement.Structural annotation of MS/MS spectra from untargeted metabolomics and exposomics datasets is a major bottleneck in gaining new biological insights. Machine learning models to convert spectra into molecular fingerprints can help in the annotation process. Here, we present IDSL_MINT, a new, easy-to-use and customizable deep-learning framework to train and utilize new models to predict molecular fingerprints from spectra for the compound annotation workflows.

2.
Anal Chem ; 95(25): 9480-9487, 2023 06 27.
Article in English | MEDLINE | ID: mdl-37311059

ABSTRACT

Poor chemical annotation of high-resolution mass spectrometry data limits applications of untargeted metabolomics datasets. Our new software, the Integrated Data Science Laboratory for Metabolomics and Exposomics─Composite Spectra Analysis (IDSL.CSA) R package, generates composite mass spectra libraries from MS1-only data, enabling the chemical annotation of high-resolution mass spectrometry coupled with liquid chromatography peaks regardless of the availability of MS2 fragmentation spectra. We demonstrate comparable annotation rates for commonly detected endogenous metabolites in human blood samples using IDSL.CSA libraries versus MS/MS libraries in validation tests. IDSL.CSA can create and search composite spectra libraries from any untargeted metabolomics dataset generated using high-resolution mass spectrometry coupled to liquid or gas chromatography instruments. The cross-applicability of these libraries across independent studies may provide access to new biological insights that may be missed due to the lack of MS2 fragmentation data. The IDSL.CSA package is available in the R-CRAN repository at https://cran.r-project.org/package=IDSL.CSA. Detailed documentation and tutorials are provided at https://github.com/idslme/IDSL.CSA.


Subject(s)
Metabolomics , Tandem Mass Spectrometry , Humans , Tandem Mass Spectrometry/methods , Gas Chromatography-Mass Spectrometry/methods , Metabolomics/methods , Software , Chromatography, Liquid
3.
bioRxiv ; 2023 May 31.
Article in English | MEDLINE | ID: mdl-36798308

ABSTRACT

Poor chemical annotation of high-resolution mass spectrometry data limit applications of untargeted metabolomics datasets. Our new software, the Integrated Data Science Laboratory for Metabolomics and Exposomics - Composite Spectra Analysis (IDSL.CSA) R package, generates composite mass spectra libraries from MS1-only data, enabling the chemical annotation of LC/HRMS peaks regardless of the availability of MS2 fragmentation spectra. We demonstrate comparable annotation rates for commonly detected endogenous metabolites in human blood samples using IDSL.CSA libraries versus MS/MS libraries in validation tests. IDSL.CSA can create and search composite spectra libraries from any untargeted metabolomics dataset generated using high-resolution mass spectrometry coupled to liquid or gas chromatography instruments. The cross-applicability of these libraries across independent studies may provide access to new biological insights that may be missed due to the lack of MS2 fragmentation data. The IDSL.CSA package is available in the R CRAN repository at https://cran.r-project.org/package=IDSL.CSA . Detailed documentation and tutorials are provided at https://github.com/idslme/IDSL.CSA .

4.
Anal Chem ; 94(39): 13315-13322, 2022 10 04.
Article in English | MEDLINE | ID: mdl-36137231

ABSTRACT

Untargeted liquid chromatography/high-resolution mass spectrometry (LC/HRMS) assays in metabolomics and exposomics aim to characterize the small molecule chemical space in a biospecimen. To gain maximum biological insights from these data sets, LC/HRMS peaks should be annotated with chemical and functional information including molecular formula, structure, chemical class, and metabolic pathways. Among these, molecular formulas may be assigned to LC/HRMS peaks through matching theoretical and observed isotopic profiles (MS1) of the underlying ionized compound. For this, we have developed the Integrated Data Science Laboratory for Metabolomics and Exposomics-United Formula Annotation (IDSL.UFA) R package. In the untargeted metabolomics validation tests, IDSL.UFA assigned 54.31-85.51% molecular formula for true positive annotations as the top hit and 90.58-100% within the top five hits. Molecular formula annotations were also supported by tandem mass spectrometry data. We have implemented new strategies to (1) generate formula sources and their theoretical isotopic profiles, (2) optimize the formula hits ranking for the individual and aligned peak lists, and (3) scale IDSL.UFA-based workflows for studies with larger sample sizes. Annotating the raw data for a publicly available pregnancy metabolome study using IDSL.UFA highlighted hundreds of new pregnancy-related compounds and also suggested the presence of chlorinated perfluorotriether alcohols (Cl-PFTrEAs) in human specimens. IDSL.UFA is useful for human metabolomics and exposomics studies where we need to minimize the loss of biological insights in untargeted LC/HRMS data sets. The IDSL.UFA package is available in the R CRAN repository https://cran.r-project.org/package=IDSL.UFA. Detailed documentation and tutorials are also provided at www.ufa.idsl.me.


Subject(s)
Metabolomics , Tandem Mass Spectrometry , Alcohols , Chromatography, Liquid/methods , Humans , Metabolome , Metabolomics/methods , Tandem Mass Spectrometry/methods
5.
J Proteome Res ; 21(6): 1485-1494, 2022 06 03.
Article in English | MEDLINE | ID: mdl-35579321

ABSTRACT

Generating comprehensive and high-fidelity metabolomics data matrices from LC/HRMS data remains to be extremely challenging for population-scale large studies (n > 200). Here, we present a new data processing pipeline, the Intrinsic Peak Analysis (IDSL.IPA) R package (https://ipa.idsl.me), to generate such data matrices specifically for organic compounds. The IDSL.IPA pipeline incorporates (1) identifying potential 12C and 13C ion pairs in individual mass spectra; (2) detecting and characterizing chromatographic peaks using a new sensitive and versatile approach to perform mass correction, peak smoothing, baseline development for local noise measurement, and peak quality determination; (3) correcting retention time and cross-referencing peaks from multiple samples by a dynamic retention index marker approach; (4) annotating peaks using a reference database of m/z and retention time; and (5) accelerating data processing using a parallel computation of the peak detection and alignment steps for larger studies. This pipeline has been successfully evaluated for studies ranging from 200 to 1600 samples. By specifically isolating high quality and reliable signals pertaining to carbon-containing compounds in untargeted LC/HRMS data sets from larger studies, IDSL.IPA opens new opportunities for discovering new biological insights in the population-scale metabolomics and exposomics projects. The package is available in the R CRAN repository at https://cran.r-project.org/package=IDSL.IPA.


Subject(s)
Metabolomics , Software , Chromatography, Liquid/methods , Mass Spectrometry , Metabolomics/methods , Organic Chemicals
6.
Environ Int ; 164: 107240, 2022 06.
Article in English | MEDLINE | ID: mdl-35461097

ABSTRACT

Inter-chemical correlations in metabolomics and exposomics datasets provide valuable information for studying relationships among chemicals reported for human specimens. With an increase in the number of compounds for these datasets, a network graph analysis and visualization of the correlation structure is difficult to interpret. We have developed the Chemical Correlation Database (CCDB), as a systematic catalogue of inter-chemical correlation in publicly available metabolomics and exposomics studies. The database has been provided via an online interface to create single compound-centric views. We have demonstrated various applications of the database to explore: 1) the chemicals from a chemical class such as Per- and Polyfluoroalkyl Substances (PFAS), polycyclic aromatic hydrocarbons (PAHs), polychlorinated biphenyls (PCBs), phthalates and tobacco smoke related metabolites; 2) xenobiotic metabolites such as caffeine and acetaminophen; 3) endogenous metabolites (acyl-carnitines); and 4) unannotated peaks for PFAS. The database has a rich collection of 35 human studies, including the National Health and Nutrition Examination Survey (NHANES) and high-quality untargeted metabolomics datasets. CCDB is supported by a simple, interactive and user-friendly web-interface to retrieve and visualize the inter-chemical correlation data. The CCDB has the potential to be a key computational resource in metabolomics and exposomics facilitating the expansion of our understanding about biological and chemical relationships among metabolites and chemical exposures in the human body. The database is available at www.ccdb.idsl.me site.


Subject(s)
Fluorocarbons , Polychlorinated Biphenyls , Data Management , Humans , Metabolomics , Nutrition Surveys
7.
Commun Biol ; 5(1): 334, 2022 04 07.
Article in English | MEDLINE | ID: mdl-35393526

ABSTRACT

Identifying the genetic determinants of inter-individual variation in lipid species (lipidome) may provide deeper understanding and additional insight into the mechanistic effect of complex lipidomic pathways in CVD risk and progression beyond simple traditional lipids. Previous studies have been largely population based and thus only powered to discover associations with common genetic variants. Founder populations represent a powerful resource to accelerate discovery of previously unknown biology associated with rare population alleles that have risen to higher frequency due to genetic drift. We performed a genome-wide association scan of 355 lipid species in 650 individuals from the Amish founder population including 127 lipid species not previously tested. To the best of our knowledge, we report for the first time the lipid species associated with two rare-population but Amish-enriched lipid variants: APOB_rs5742904 and APOC3_rs76353203. We also identified novel associations for 3 rare-population Amish-enriched loci with several sphingolipids and with proposed potential functional/causal variant in each locus including GLTPD2_rs536055318, CERS5_rs771033566, and AKNA_rs531892793. We replicated 7 previously known common loci including novel associations with two sterols: androstenediol with UGT locus and estriol with SLC22A8/A24 locus. Our results show the double power of founder populations and detailed lipidome to discover novel trait-associated variants.


Subject(s)
Amish , Founder Effect , Genetics, Population , Lipidomics , Amish/genetics , DNA-Binding Proteins/genetics , Genome-Wide Association Study , Humans , Lipids , Nuclear Proteins/genetics , Transcription Factors/genetics
8.
Front Public Health ; 9: 653599, 2021.
Article in English | MEDLINE | ID: mdl-34178917

ABSTRACT

Background: An untargeted chemical analysis of bio-fluids provides semi-quantitative data for thousands of chemicals for expanding our understanding about relationships among metabolic pathways, diseases, phenotypes and exposures. During the processing of mass spectral and chromatography data, various signal thresholds are used to control the number of peaks in the final data matrix that is used for statistical analyses. However, commonly used stringent thresholds generate constrained data matrices which may under-represent the detected chemical space, leading to missed biological insights in the exposome research. Methods: We have re-analyzed a liquid chromatography high resolution mass spectrometry data set for a publicly available epidemiology study (n = 499) of human cord blood samples using the MS-DIAL software with minimally possible thresholds during the data processing steps. Peak list for individual files and the data matrix after alignment and gap-filling steps were summarized for different peak height and detection frequency thresholds. Correlations between birth weight and LC/MS peaks in the newly generated data matrix were computed using the spearman correlation coefficient. Results: MS-DIAL software detected on average 23,156 peaks for individual LC/MS file and 63,393 peaks in the aligned peak table. A combination of peak height and detection frequency thresholds that was used in the original publication at the individual file and the peak alignment levels can reject 90% peaks from the untargeted chemical analysis dataset that was generated by MS-DIAL. Correlation analysis for birth weight data suggested that up to 80% of the significantly associated peaks were rejected by the data processing thresholds that were used in the original publication. The re-analysis with minimum possible thresholds recovered metabolic insights about C19 steroids and hydroxy-acyl-carnitines and their relationships with birth weight. Conclusions: Data processing thresholds for peak height and detection frequencies at individual data file and at the alignment level should be used at minimal possible level or completely avoided for mining untargeted chemical analysis data in the exposome research for discovering new biomarkers and mechanisms.


Subject(s)
Metabolomics , Software , Chromatography, Liquid , Gas Chromatography-Mass Spectrometry , Humans , Mass Spectrometry
9.
Environ Int ; 156: 106624, 2021 11.
Article in English | MEDLINE | ID: mdl-33984576

ABSTRACT

BACKGROUND: Systematic evaluation of literature data on the cancer hazards of human exposures is an essential process underlying cancer prevention strategies. The scope and volume of evidence for suspected carcinogens can range from very few to thousands of publications, requiring a complex, systematically planned, and critical procedure to nominate, prioritize and evaluate carcinogenic agents. To aid in this process, database fusion, cheminformatics and text mining techniques can be combined into an integrated approach to inform agent prioritization, selection, and grouping. RESULTS: We have applied these techniques to agents recommended for the IARC Monographs evaluations during 2020-2024. An integration of PubMed filters to cover cancer epidemiology, key characteristics of carcinogens, chemical lists from 34 databases relevant for cancer research, chemical structure grouping and a literature data-based clustering was applied in an innovative approach to 119 agents recommended by an advisory group for future IARC Monographs evaluations. The approach also facilitated a rational grouping of these agents and aids in understanding the volume and complexity of relevant information, as well as important gaps in coverage of the available studies on cancer etiology and carcinogenesis. CONCLUSION: A new data-science approach has been applied to diverse agents recommended for cancer hazard assessments, and its applications for the IARC Monographs are demonstrated. The prioritization approach has been made available at www.cancer.idsl.me site for ranking cancer agents.


Subject(s)
Neoplasms , Carcinogenesis , Carcinogens/toxicity , Data Mining , Databases, Factual , Humans , Neoplasms/epidemiology
10.
Lipids Health Dis ; 19(1): 153, 2020 Jun 25.
Article in English | MEDLINE | ID: mdl-32586392

ABSTRACT

BACKGROUND: The lipoprotein insulin resistance (LPIR) score was shown to predict insulin resistance (IR) and type 2 diabetes (T2D) in healthy adults. However, the molecular basis underlying the LPIR utility for classification remains unclear. OBJECTIVE: To identify small molecule lipids associated with variation in the LPIR score, a weighted index of lipoproteins measured by nuclear magnetic resonance, in the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study (n = 980). METHODS: Linear mixed effects models were used to test the association between the LPIR score and 413 lipid species and their principal component analysis-derived groups. Significant associations were tested for replication with homeostatic model assessment-IR (HOMA-IR), a phenotype correlated with the LPIR score (r = 0.48, p <  0.001), in the Heredity and Phenotype Intervention (HAPI) Heart Study (n = 590). RESULTS: In GOLDN, 319 lipids were associated with the LPIR score (false discovery rate-adjusted p-values ranging from 4.59 × 10- 161 to 49.50 × 10- 3). Factors 1 (triglycerides and diglycerides/storage lipids) and 3 (mixed lipids) were positively (ß = 0.025, p = 4.52 × 10- 71 and ß = 0.021, p = 5.84 × 10- 41, respectively) and factor 2 (phospholipids/non-storage lipids) was inversely (ß = - 0.013, p = 2.28 × 10- 18) associated with the LPIR score. These findings were replicated for HOMA-IR in the HAPI Heart Study (ß = 0.10, p = 1.21 × 10- 02 for storage, ß = - 0.13, p = 3.14 × 10- 04 for non-storage, and ß = 0.19, p = 8.40 × 10- 07 for mixed lipids). CONCLUSIONS: Non-storage lipidomics species show a significant inverse association with the LPIR metabolic dysfunction score and present a promising focus for future therapeutic and prevention studies.


Subject(s)
Insulin Resistance/physiology , Lipids/blood , Adult , Aged , Body Mass Index , Diabetes Mellitus, Type 2/blood , Female , Humans , Lipidomics , Lipoproteins/blood , Male , Middle Aged , Triglycerides/blood , Waist Circumference
11.
Anal Chem ; 92(11): 7515-7522, 2020 06 02.
Article in English | MEDLINE | ID: mdl-32390414

ABSTRACT

Unidentified peaks remain a major problem in untargeted metabolomics by LC-MS/MS. Confidence in peak annotations increases by combining MS/MS matching and retention time. We here show how retention times can be predicted from molecular structures. Two large, publicly available data sets were used for model training in machine learning: the Fiehn hydrophilic interaction liquid chromatography data set (HILIC) of 981 primary metabolites and biogenic amines,and the RIKEN plant specialized metabolome annotation (PlaSMA) database of 852 secondary metabolites that uses reversed-phase liquid chromatography (RPLC). Five different machine learning algorithms have been integrated into the Retip R package: the random forest, Bayesian-regularized neural network, XGBoost, light gradient-boosting machine (LightGBM), and Keras algorithms for building the retention time prediction models. A complete workflow for retention time prediction was developed in R. It can be freely downloaded from the GitHub repository (https://www.retip.app). Keras outperformed other machine learning algorithms in the test set with minimum overfitting, verified by small error differences between training, test, and validation sets. Keras yielded a mean absolute error of 0.78 min for HILIC and 0.57 min for RPLC. Retip is integrated into the mass spectrometry software tools MS-DIAL and MS-FINDER, allowing a complete compound annotation workflow. In a test application on mouse blood plasma samples, we found a 68% reduction in the number of candidate structures when searching all isomers in MS-FINDER compound identification software. Retention time prediction increases the identification rate in liquid chromatography and subsequently leads to an improved biological interpretation of metabolomics data.


Subject(s)
Machine Learning , Metabolomics , Organic Chemicals/blood , Chromatography, Liquid , Humans , Tandem Mass Spectrometry , Time Factors
12.
Neurology ; 94(20): e2088-e2098, 2020 05 19.
Article in English | MEDLINE | ID: mdl-32358220

ABSTRACT

OBJECTIVE: To investigate the association of triglyceride (TG) principal component scores with Alzheimer disease (AD) and the amyloid, tau, neurodegeneration, and cerebrovascular disease (A/T/N/V) biomarkers for AD. METHODS: Serum levels of 84 TG species were measured with untargeted lipid profiling of 689 participants from the Alzheimer's Disease Neuroimaging Initiative cohort, including 190 cognitively normal older adults (CN), 339 with mild cognitive impairment (MCI), and 160 with AD. Principal component analysis with factor rotation was used for dimension reduction of TG species. Differences in principal components between diagnostic groups and associations between principal components and AD biomarkers (including CSF, MRI and [18F]fluorodeoxyglucose-PET) were assessed with a generalized linear model approach. In both cases, the Bonferroni method of adjustment was used to correct for multiple comparisons. RESULTS: The 84 TGs yielded 9 principal components, 2 of which, consisting of long-chain, polyunsaturated fatty acid-containing TGs (PUTGs), were significantly associated with MCI and AD. Lower levels of PUTGs were observed in MCI and AD compared to CN. PUTG principal component scores were also significantly associated with hippocampal volume and entorhinal cortical thickness. In participants carrying the APOE ε4 allele, these principal components were significantly associated with CSF ß-amyloid1-42 values and entorhinal cortical thickness. CONCLUSION: This study shows that PUTG component scores were significantly associated with diagnostic group and AD biomarkers, a finding that was more pronounced in APOE ε4 carriers. Replication in independent larger studies and longitudinal follow-up are warranted.


Subject(s)
Biomarkers/cerebrospinal fluid , Cognitive Dysfunction/diagnosis , Neuroimaging , Triglycerides/blood , Aged , Aged, 80 and over , Alzheimer Disease/diagnosis , Amyloid beta-Peptides/metabolism , Biomarkers/blood , Cognition/physiology , Cognitive Dysfunction/blood , Cognitive Dysfunction/cerebrospinal fluid , Disease Progression , Female , Hippocampus/metabolism , Humans , Magnetic Resonance Imaging/methods , Male , Neuropsychological Tests , Triglycerides/cerebrospinal fluid
13.
Alzheimers Dement (Amst) ; 11: 619-627, 2019 Dec.
Article in English | MEDLINE | ID: mdl-31517024

ABSTRACT

INTRODUCTION: Comorbidity with metabolic diseases indicates that lipid metabolism plays a role in the etiology of Alzheimer's disease (AD). Comprehensive lipidomic analysis can provide new insights into the altered lipid metabolism in AD. METHOD: In this study, a total 349 serum lipids were measured in 806 participants enrolled in the Alzheimer's Disease Neuroimaging Initiative Phase 1 cohort and analyzed using lipid-set enrichment statistics, a data mining method to find coregulated lipid sets. RESULTS: We found that sets of blood lipids were associated with current AD biomarkers and with AD clinical symptoms. AD diagnosis was associated with 7 of 28 lipid sets of which four also correlated with cognitive decline, including polyunsaturated fatty acids. Cerebrospinal fluid amyloid beta (Aß1-42) correlated with glucosylceramides, lysophosphatidylcholines and unsaturated triacylglycerides; cerebrospinal fluid total tau and brain atrophy correlated with monounsaturated sphingomyelins and ceramides, in addition to EPA-containing lipids. DISCUSSION: AD-associated lipid sets indicated that lipid desaturation, elongation, and acyl chain remodeling processes are disturbed in AD subjects. Monounsaturated lipid metabolism was important in early stages of AD, whereas the polyunsaturated lipid metabolism was associated with later stages of AD. Our study provides several new hypotheses for studying the role of lipid metabolism in AD.

14.
Environ Health Perspect ; 127(9): 97008, 2019 09.
Article in English | MEDLINE | ID: mdl-31557052

ABSTRACT

BACKGROUND: Blood chemicals are routinely measured in clinical or preclinical research studies to diagnose diseases, assess risks in epidemiological research, or use metabolomic phenotyping in response to treatments. A vast volume of blood-related literature is available via the PubMed database for data mining. OBJECTIVES: We aimed to generate a comprehensive blood exposome database of endogenous and exogenous chemicals associated with the mammalian circulating system through text mining and database fusion. METHODS: Using NCBI resources, we retrieved PubMed abstracts, PubChem chemical synonyms, and PMC supplementary tables. We then employed text mining and PubChem crowdsourcing to associate phrases relating to blood with PubChem chemicals. False positives were removed by a phrase pattern and a compound exclusion list. RESULTS: A query to identify blood-related publications in the PubMed database yielded 1.1 million papers. Matching a total of 15 million synonyms from 6.5 million relevant PubChem chemicals against all blood-related publications yielded 37,514 chemicals and 851,999 publications records. Mapping PubChem compound identifiers to the PubMed database yielded 49,940 unique chemicals linked to 676,643 papers. Analysis of open-access metabolomics papers related to blood phrases in the PMC database yielded 4,039 unique compounds and 204 papers. Consolidating these three approaches summed up to a total of 41,474 achiral structures that were linked to 65,957 PubChem CIDs and to over 878,966 PubMed articles. We mapped these compounds to 50 databases such as those covering metabolites and pathways, governmental and toxicological databases, pharmacology resources, and bioassay repositories. In comparison, HMDB, the Human Metabolome Database, links 1,075 compounds to blood-related primary publications. CONCLUSION: This new Blood Exposome Database can be used for prioritizing chemicals for systematic reviews, developing target assays in exposome research, identifying compounds in untargeted mass spectrometry, and biological interpretation in metabolomics data. The database is available at http://bloodexposome.org. https://doi.org/10.1289/EHP4713.


Subject(s)
Data Mining , Databases, Factual , Exposome , Biological Assay , Data Management , Humans , Metabolome , Metabolomics , PubMed , Publications
15.
Oncotarget ; 10(39): 3894-3909, 2019 Jun 11.
Article in English | MEDLINE | ID: mdl-31231467

ABSTRACT

Estrogen-receptor negative (ERneg) breast cancer is an aggressive breast cancer subtype in the need for new therapeutic options. We have analyzed metabolomics, proteomics and transcriptomics data for a cohort of 276 breast tumors (MetaCancer study) and nine public transcriptomics datasets using univariate statistics, meta-analysis, Reactome pathway analysis, biochemical network mapping and text mining of metabolic genes. In the MetaCancer cohort, a total of 29% metabolites, 21% proteins and 33% transcripts were significantly different (raw p <0.05) between ERneg and ERpos breast tumors. In the nine public transcriptomics datasets, on average 23% of all genes were significantly different (raw p <0.05). Specifically, up to 60% of the metabolic genes were significantly different (meta-analysis raw p <0.05) across the transcriptomics datasets. Reactome pathway analysis of all omics showed that energy metabolism, and biosynthesis of nucleotides, amino acids, and lipids were associated with ERneg status. Text mining revealed that several significant metabolic genes and enzymes have been rarely reported to date, including PFKP, GART, PLOD1, ASS1, NUDT12, FAR1, PDE7A, FAHD1, ITPK1, SORD, HACD3, CDS2 and PDSS1. Metabolic processes associated with ERneg tumors were identified by multi-omics integration analysis of metabolomics, proteomics and transcriptomics data. Overall results suggested that TCA anaplerosis, proline biosynthesis, synthesis of complex lipids and mechanisms for recycling substrates were activated in ERneg tumors. Under-reported genes were revealed by text mining which may serve as novel candidates for drug targets in cancer therapies. The workflow presented here can also be used for other tumor types.

16.
Life Sci ; 221: 212-223, 2019 Mar 15.
Article in English | MEDLINE | ID: mdl-30731143

ABSTRACT

AIMS: To determine the metabolic adaptations to compensated heart failure using a reproducible model of myocardial infarction and an unbiased metabolic screen. To address the limitations in sample availability and model variability observed in preclinical and clinical metabolic investigations of heart failure. MAIN METHODS: Metabolomic analysis was performed on serum and myocardial tissue from rabbits after myocardial infarction (MI) was induced by cryo-injury of the left ventricular free wall. Rabbits followed for 12 weeks after MI exhibited left ventricular dilation and depressed systolic function as determined by echocardiography. Serum and tissue from the viable left ventricular free wall, interventricular septum and right ventricle were analyzed using a gas chromatography time of flight mass spectrometry-based untargeted metabolomics assay for primary metabolites. KEY FINDINGS: Unique results included: a two- three-fold increase in taurine levels in all three ventricular regions of MI rabbits and similarly, the three regions had increased inosine levels compared to sham controls. Reduced myocardial levels of myo-inositol in the myocardium of MI animals point to altered phospholipid metabolism and membrane receptor function in heart failure. Metabolite profiles also provide evidence for responses to oxidative stress and an impairment in TCA cycle energy production in the failing heart. SIGNIFICANCE: Our results revealed metabolic changes during compensated cardiac dysfunction and suggest potential targets for altering the progression of heart failure.


Subject(s)
Heart Failure/metabolism , Myocardial Infarction/metabolism , Myocardium/metabolism , Animals , Echocardiography , Female , Heart Ventricles/metabolism , Inosine/analysis , Inosine/blood , Inositol/analysis , Male , Metabolomics/methods , Myocardium/cytology , Oxidative Stress/physiology , Rabbits , Systole/physiology , Taurine/analysis , Taurine/blood , Ventricular Function, Left/physiology , Ventricular Remodeling/physiology
17.
Sci Data ; 5: 180263, 2018 11 20.
Article in English | MEDLINE | ID: mdl-30457571

ABSTRACT

Alzheimer's disease (AD) is a major public health priority with a large socioeconomic burden and complex etiology. The Alzheimer Disease Metabolomics Consortium (ADMC) and the Alzheimer Disease Neuroimaging Initiative (ADNI) aim to gain new biological insights in the disease etiology. We report here an untargeted lipidomics of serum specimens of 806 subjects within the ADNI1 cohort (188 AD, 392 mild cognitive impairment and 226 cognitively normal subjects) along with 83 quality control samples. Lipids were detected and measured using an ultra-high-performance liquid chromatography quadruple/time-of-flight mass spectrometry (UHPLC-QTOF MS) instrument operated in both negative and positive electrospray ionization modes. The dataset includes a total 513 unique lipid species out of which 341 are known lipids. For over 95% of the detected lipids, a relative standard deviation of better than 20% was achieved in the quality control samples, indicating high technical reproducibility. Association modeling of this dataset and available clinical, metabolomics and drug-use data will provide novel insights into the AD etiology. These datasets are available at the ADNI repository at http://adni.loni.usc.edu/.


Subject(s)
Alzheimer Disease , Lipids/analysis , Lipids/blood , Metabolomics , Aged , Aged, 80 and over , Alzheimer Disease/blood , Alzheimer Disease/diagnosis , Alzheimer Disease/etiology , Alzheimer Disease/physiopathology , Cognitive Dysfunction , Cohort Studies , Humans , Mass Spectrometry , Metabolomics/methods , Metabolomics/standards , Neuroimaging
18.
Curr Opin Biotechnol ; 54: 1-9, 2018 12.
Article in English | MEDLINE | ID: mdl-29413745

ABSTRACT

Access to high quality metabolomics data has become a routine component for biological studies. However, interpreting those datasets in biological contexts remains a challenge, especially because many identified metabolites are not found in biochemical pathway databases. Starting from statistical analyses, a range of new tools are available, including metabolite set enrichment analysis, pathway and network visualization, pathway prediction, biochemical databases and text mining. Integrating these approaches into comprehensive and unbiased interpretations must carefully consider both caveats of the metabolomics dataset itself as well as the structure and properties of the biological study design. Special considerations need to be taken when adopting approaches from genomics for use in metabolomics. R and Python programming language are enabling an easier exchange of diverse tools to deploy integrated workflows. This review summarizes the key ideas and latest developments in regards to these approaches.


Subject(s)
Computational Biology/methods , Databases, Factual , Metabolomics/methods , Animals , Data Mining , Humans , Metabolome
19.
Mass Spectrom Rev ; 37(4): 513-532, 2018 07.
Article in English | MEDLINE | ID: mdl-28436590

ABSTRACT

Tandem mass spectral library search (MS/MS) is the fastest way to correctly annotate MS/MS spectra from screening small molecules in fields such as environmental analysis, drug screening, lipid analysis, and metabolomics. The confidence in MS/MS-based annotation of chemical structures is impacted by instrumental settings and requirements, data acquisition modes including data-dependent and data-independent methods, library scoring algorithms, as well as post-curation steps. We critically discuss parameters that influence search results, such as mass accuracy, precursor ion isolation width, intensity thresholds, centroiding algorithms, and acquisition speed. A range of publicly and commercially available MS/MS databases such as NIST, MassBank, MoNA, LipidBlast, Wiley MSforID, and METLIN are surveyed. In addition, software tools including NIST MS Search, MS-DIAL, Mass Frontier, SmileMS, Mass++, and XCMS2 to perform fast MS/MS search are discussed. MS/MS scoring algorithms and challenges during compound annotation are reviewed. Advanced methods such as the in silico generation of tandem mass spectra using quantum chemistry and machine learning methods are covered. Community efforts for curation and sharing of tandem mass spectra that will allow for faster distribution of scientific discoveries are discussed.


Subject(s)
Machine Learning , Small Molecule Libraries/isolation & purification , Software , Tandem Mass Spectrometry/statistics & numerical data , Computer Simulation , Databases, Chemical , Humans , Information Dissemination , Models, Chemical , Quantum Theory , Tandem Mass Spectrometry/instrumentation , Tandem Mass Spectrometry/methods
20.
Sci Rep ; 7(1): 14567, 2017 11 06.
Article in English | MEDLINE | ID: mdl-29109515

ABSTRACT

Metabolomics answers a fundamental question in biology: How does metabolism respond to genetic, environmental or phenotypic perturbations? Combining several metabolomics assays can yield datasets for more than 800 structurally identified metabolites. However, biological interpretations of metabolic regulation in these datasets are hindered by inherent limits of pathway enrichment statistics. We have developed ChemRICH, a statistical enrichment approach that is based on chemical similarity rather than sparse biochemical knowledge annotations. ChemRICH utilizes structure similarity and chemical ontologies to map all known metabolites and name metabolic modules. Unlike pathway mapping, this strategy yields study-specific, non-overlapping sets of all identified metabolites. Subsequent enrichment statistics is superior to pathway enrichments because ChemRICH sets have a self-contained size where p-values do not rely on the size of a background database. We demonstrate ChemRICH's efficiency on a public metabolomics data set discerning the development of type 1 diabetes in a non-obese diabetic mouse model. ChemRICH is available at www.chemrich.fiehnlab.ucdavis.edu.


Subject(s)
Data Interpretation, Statistical , Metabolic Networks and Pathways , Metabolomics , Animals , Biological Ontologies , Datasets as Topic , Diabetes Mellitus, Experimental/etiology , Diabetes Mellitus, Experimental/metabolism , Disease Models, Animal , Humans , Metabolomics/methods , Mice
SELECTION OF CITATIONS
SEARCH DETAIL
...