Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 80
Filter
1.
Bioinformatics ; 40(3)2024 Mar 04.
Article in English | MEDLINE | ID: mdl-38383048

ABSTRACT

MOTIVATION: Random forests (RFs) can deal with a large number of variables, achieve reasonable prediction scores, and yield highly interpretable feature importance values. As such, RFs are appropriate models for feature selection and further dimension reduction. However, RFs are often not appropriate for correlated datasets due to their mode of selecting individual features for splitting. Addressing correlation relationships in high-dimensional datasets is imperative for reducing the number of variables that are assigned high importance, hence making the dimension reduction most efficient. Here, we propose the LAtent VAriable Stochastic Ensemble of Trees (LAVASET) method that derives latent variables based on the distance characteristics of each feature and aims to incorporate the correlation factor in the splitting step. RESULTS: Without compromising on performance in the majority of examples, LAVASET outperforms RF by accurately determining feature importance across all correlated variables and ensuring proper distribution of importance values. LAVASET yields mostly non-inferior prediction accuracies to traditional RFs when tested in simulated and real 1D datasets, as well as more complex and high-dimensional 3D datatypes. Unlike traditional RFs, LAVASET is unaffected by single 'important' noisy features (false positives), as it considers the local neighbourhood. LAVASET, therefore, highlights neighbourhoods of features, reflecting real signals that collectively impact the model's predictive ability. AVAILABILITY AND IMPLEMENTATION: LAVASET is freely available as a standalone package from https://github.com/melkasapi/LAVASET.

2.
Curr Opin Chem Biol ; 74: 102288, 2023 06.
Article in English | MEDLINE | ID: mdl-36966702

ABSTRACT

The computational metabolomics field brings together computer scientists, bioinformaticians, chemists, clinicians, and biologists to maximize the impact of metabolomics across a wide array of scientific and medical disciplines. The field continues to expand as modern instrumentation produces datasets with increasing complexity, resolution, and sensitivity. These datasets must be processed, annotated, modeled, and interpreted to enable biological insight. Techniques for visualization, integration (within or between omics), and interpretation of metabolomics data have evolved along with innovation in the databases and knowledge resources required to aid understanding. In this review, we highlight recent advances in the field and reflect on opportunities and innovations in response to the most pressing challenges. This review was compiled from discussions from the 2022 Dagstuhl seminar entitled "Computational Metabolomics: From Spectra to Knowledge".


Subject(s)
Computational Biology , Metabolomics , Metabolomics/methods , Mass Spectrometry/methods , Databases, Factual , Computational Biology/methods
3.
Metabolomics ; 18(12): 102, 2022 12 05.
Article in English | MEDLINE | ID: mdl-36469142

ABSTRACT

BACKGROUND: Compound identification remains a critical bottleneck in the process of exploiting Nuclear Magnetic Resonance (NMR) metabolomics data, especially for 1H 1-dimensional (1H 1D) data. As databases of reference compound spectra have grown, workflows have evolved to rely heavily on their search functions to facilitate this process by generating lists of potential metabolites found in complex mixture data, facilitating annotation and identification. However, approaches for validating and communicating annotations are most often guided by expert knowledge, and therefore are highly variable despite repeated efforts to align practices and define community standards. AIM OF REVIEW: This review is aimed at broadening the application of automated annotation tools by discussing the key ideas of spectral matching and beginning to describe a set of terms to classify this information, thus advancing standards for communicating annotation confidence. Additionally, we hope that this review will facilitate the growing collaboration between chemical data scientists, software developers and the NMR metabolomics community aiding development of long-term software solutions. KEY SCIENTIFIC CONCEPTS OF REVIEW: We begin with a brief discussion of the typical untargeted NMR identification workflow. We differentiate between annotation (hypothesis generation, filtering), and identification (hypothesis testing, verification), and note the utility of different NMR data features for annotation. We then touch on three parts of annotation: (1) generation of queries, (2) matching queries to reference data, and (3) scoring and confidence estimation of potential matches for verification. In doing so, we highlight existing approaches to automated and semi-automated annotation from the perspective of the structural information they utilize, as well as how this information can be represented computationally.


Subject(s)
Metabolomics , Software , Metabolomics/methods , Magnetic Resonance Spectroscopy/methods , Magnetic Resonance Imaging , Databases, Factual
4.
BMC Bioinformatics ; 23(1): 481, 2022 Nov 14.
Article in English | MEDLINE | ID: mdl-36376837

ABSTRACT

BACKGROUND: Single sample pathway analysis (ssPA) transforms molecular level omics data to the pathway level, enabling the discovery of patient-specific pathway signatures. Compared to conventional pathway analysis, ssPA overcomes the limitations by enabling multi-group comparisons, alongside facilitating numerous downstream analyses such as pathway-based machine learning. While in transcriptomics ssPA is a widely used technique, there is little literature evaluating its suitability for metabolomics. Here we provide a benchmark of established ssPA methods (ssGSEA, GSVA, SVD (PLAGE), and z-score) alongside the evaluation of two novel methods we propose: ssClustPA and kPCA, using semi-synthetic metabolomics data. We then demonstrate how ssPA can facilitate pathway-based interpretation of metabolomics data by performing a case-study on inflammatory bowel disease mass spectrometry data, using clustering to determine subtype-specific pathway signatures. RESULTS: While GSEA-based and z-score methods outperformed the others in terms of recall, clustering/dimensionality reduction-based methods provided higher precision at moderate-to-high effect sizes. A case study applying ssPA to inflammatory bowel disease data demonstrates how these methods yield a much richer depth of interpretation than conventional approaches, for example by clustering pathway scores to visualise a pathway-based patient subtype-specific correlation network. We also developed the sspa python package (freely available at https://pypi.org/project/sspa/ ), providing implementations of all the methods benchmarked in this study. CONCLUSION: This work underscores the value ssPA methods can add to metabolomic studies and provides a useful reference for those wishing to apply ssPA methods to metabolomics data.


Subject(s)
Inflammatory Bowel Diseases , Metabolomics , Humans , Metabolomics/methods , Transcriptome , Cluster Analysis , Mass Spectrometry
6.
Int J Cancer ; 151(12): 2115-2127, 2022 Dec 15.
Article in English | MEDLINE | ID: mdl-35866293

ABSTRACT

Prostate cancer (PCa) is the most common cancer form in males in many European and American countries, but there are still open questions regarding its etiology. Untargeted metabolomics can produce an unbiased global metabolic profile, with the opportunity for uncovering new plasma metabolites prospectively associated with risk of PCa, providing insights into disease etiology. We conducted a prospective untargeted liquid chromatography-mass spectrometry (LC-MS) metabolomics analysis using prediagnostic fasting plasma samples from 752 PCa case-control pairs nested within the Northern Sweden Health and Disease Study (NSHDS). The pairs were matched by age, BMI, and sample storage time. Discriminating features were identified by a combination of orthogonal projection to latent structures-effect projections (OPLS-EP) and Wilcoxon signed-rank tests. Their prospective associations with PCa risk were investigated by conditional logistic regression. Subgroup analyses based on stratification by disease aggressiveness and baseline age were also conducted. Various free fatty acids and phospholipids were positively associated with overall risk of PCa and in various stratification subgroups. Aromatic amino acids were positively associated with overall risk of PCa. Uric acid was positively, and glucose negatively, associated with risk of PCa in the older subgroup. This is the largest untargeted LC-MS based metabolomics study to date on plasma metabolites prospectively associated with risk of developing PCa. Different subgroups of disease aggressiveness and baseline age showed different associations with metabolites. The findings suggest that shifts in plasma concentrations of metabolites in lipid, aromatic amino acid, and glucose metabolism are associated with risk of developing PCa during the following two decades.


Subject(s)
Fatty Acids, Nonesterified , Prostatic Neoplasms , Male , Humans , Case-Control Studies , Uric Acid , Sweden/epidemiology , Metabolomics/methods , Mass Spectrometry , Prostatic Neoplasms/diagnosis , Prostatic Neoplasms/epidemiology , Amino Acids, Aromatic , Glucose
7.
Am J Clin Nutr ; 116(1): 216-229, 2022 07 06.
Article in English | MEDLINE | ID: mdl-35285859

ABSTRACT

BACKGROUND: Adherence to the Dietary Approaches to Stop Hypertension (DASH) diet enhances potassium intake and reduces sodium intake and blood pressure (BP), but the underlying metabolic pathways are unclear. OBJECTIVES: Among free-living populations, we delineated metabolic signatures associated with the DASH diet adherence, 24-hour urinary sodium and potassium excretions, and the potential metabolic pathways involved. METHODS: We used 24-hour urinary metabolic profiling by proton nuclear magnetic resonance spectroscopy to characterize the metabolic signatures associated with the DASH dietary pattern score (DASH score) and 24-hour excretion of sodium and potassium among participants in the United States (n = 2164) and United Kingdom (n = 496) enrolled in the International Study of Macro- and Micronutrients and Blood Pressure (INTERMAP). Multiple linear regression and cross-tabulation analyses were used to investigate the DASH-BP relation and its modulation by sodium and potassium. Potential pathways associated with DASH adherence, sodium and potassium excretion, and BP were identified using mediation analyses and metabolic reaction networks. RESULTS: Adherence to the DASH diet was associated with urinary potassium excretion (correlation coefficient, r = 0.42; P < 0.0001). In multivariable regression analyses, a 5-point higher DASH score (range, 7 to 35) was associated with a lower systolic BP by 1.35 mmHg (95% CI, -1.95 to -0.80 mmHg; P = 1.2 × 10-5); control of the model for potassium but not sodium attenuated the DASH-BP relation. Two common metabolites (hippurate and citrate) mediated the potassium-BP and DASH-BP relationships, while 5 metabolites (succinate, alanine, S-methyl cysteine sulfoxide, 4-hydroxyhippurate, and phenylacetylglutamine) were found to be specific to the DASH-BP relation. CONCLUSIONS: Greater adherence to the DASH diet is associated with lower BP and higher potassium intake across levels of sodium intake. The DASH diet recommends greater intake of fruits, vegetables, and other potassium-rich foods that may replace sodium-rich processed foods and thereby influence BP through overlapping metabolic pathways. Possible DASH-specific pathways are speculated but confirmation requires further study. INTERMAP is registered as NCT00005271 at www.clinicaltrials.gov.


Subject(s)
Dietary Approaches To Stop Hypertension , Hypertension , Sodium, Dietary , Blood Pressure/physiology , Humans , Micronutrients , Potassium , Sodium
8.
Anal Chem ; 94(8): 3446-3455, 2022 03 01.
Article in English | MEDLINE | ID: mdl-35180347

ABSTRACT

Untargeted metabolomics and lipidomics LC-MS experiments produce complex datasets, usually containing tens of thousands of features from thousands of metabolites whose annotation requires additional MS/MS experiments and expert knowledge. All-ion fragmentation (AIF) LC-MS/MS acquisition provides fragmentation data at no additional experimental time cost. However, analysis of such datasets requires reconstruction of parent-fragment relationships and annotation of the resulting pseudo-MS/MS spectra. Here, we propose a novel approach for automated annotation of isotopologues, adducts, and in-source fragments from AIF LC-MS datasets by combining correlation-based parent-fragment linking with molecular fragment matching. Our workflow focuses on a subset of features rather than trying to annotate the full dataset, saving time and simplifying the process. We demonstrate the workflow in three human serum datasets containing 599 features manually annotated by experts. Precision and recall values of 82-92% and 82-85%, respectively, were obtained for features found in the highest-rank scores (1-5). These results equal or outperform those obtained using MS-DIAL software, the current state of the art for AIF data annotation. Further validation for other biological matrices and different instrument types showed variable precision (60-89%) and recall (10-88%) particularly for datasets dominated by nonlipid metabolites. The workflow is freely available as an open-source R package, MetaboAnnotatoR, together with the fragment libraries from Github (https://github.com/gggraca/MetaboAnnotatoR).


Subject(s)
Metabolomics , Tandem Mass Spectrometry , Chromatography, Liquid/methods , Humans , Metabolomics/methods , Software , Tandem Mass Spectrometry/methods , Workflow
9.
Regul Toxicol Pharmacol ; 125: 105020, 2021 Oct.
Article in English | MEDLINE | ID: mdl-34333066

ABSTRACT

Omics methodologies are widely used in toxicological research to understand modes and mechanisms of toxicity. Increasingly, these methodologies are being applied to questions of regulatory interest such as molecular point-of-departure derivation and chemical grouping/read-across. Despite its value, widespread regulatory acceptance of omics data has not yet occurred. Barriers to the routine application of omics data in regulatory decision making have been: 1) lack of transparency for data processing methods used to convert raw data into an interpretable list of observations; and 2) lack of standardization in reporting to ensure that omics data, associated metadata and the methodologies used to generate results are available for review by stakeholders, including regulators. Thus, in 2017, the Organisation for Economic Co-operation and Development (OECD) Extended Advisory Group on Molecular Screening and Toxicogenomics (EAGMST) launched a project to develop guidance for the reporting of omics data aimed at fostering further regulatory use. Here, we report on the ongoing development of the first formal reporting framework describing the processing and analysis of both transcriptomic and metabolomic data for regulatory toxicology. We introduce the modular structure, content, harmonization and strategy for trialling this reporting framework prior to its publication by the OECD.


Subject(s)
Metabolomics/standards , Organisation for Economic Co-Operation and Development/standards , Toxicogenetics/standards , Toxicology/standards , Transcriptome/physiology , Documentation/standards , Humans
10.
Nat Protoc ; 16(9): 4299-4326, 2021 09.
Article in English | MEDLINE | ID: mdl-34321638

ABSTRACT

Metabolic phenotyping is an important tool in translational biomedical research. The advanced analytical technologies commonly used for phenotyping, including mass spectrometry (MS) and nuclear magnetic resonance (NMR) spectroscopy, generate complex data requiring tailored statistical analysis methods. Detailed protocols have been published for data acquisition by liquid NMR, solid-state NMR, ultra-performance liquid chromatography (LC-)MS and gas chromatography (GC-)MS on biofluids or tissues and their preprocessing. Here we propose an efficient protocol (guidelines and software) for statistical analysis of metabolic data generated by these methods. Code for all steps is provided, and no prior coding skill is necessary. We offer efficient solutions for the different steps required within the complete phenotyping data analytics workflow: scaling, normalization, outlier detection, multivariate analysis to explore and model study-related effects, selection of candidate biomarkers, validation, multiple testing correction and performance evaluation of statistical models. We also provide a statistical power calculation algorithm and safeguards to ensure robust and meaningful experimental designs that deliver reliable results. We exemplify the protocol with a two-group classification study and data from an epidemiological cohort; however, the protocol can be easily modified to cover a wider range of experimental designs or incorporate different modeling approaches. This protocol describes a minimal set of analyses needed to rigorously investigate typical datasets encountered in metabolic phenotyping.


Subject(s)
Genetic Techniques , Metabolomics/methods , Phenotype , Software , Statistics as Topic , Humans , Metabolism
11.
BMC Bioinformatics ; 22(1): 67, 2021 Feb 12.
Article in English | MEDLINE | ID: mdl-33579202

ABSTRACT

BACKGROUND: The search for statistically significant relationships between molecular markers and outcomes is challenging when dealing with high-dimensional, noisy and collinear multivariate omics data, such as metabolomic profiles. Permutation procedures allow for the estimation of adjusted significance levels without assuming independence among metabolomic variables. Nevertheless, the complex non-normal structure of metabolic profiles and outcomes may bias the permutation results leading to overly conservative threshold estimates i.e. lower than those from a Bonferroni or Sidak correction. METHODS: Within a univariate permutation procedure we employ parametric simulation methods based on the multivariate (log-)Normal distribution to obtain adjusted significance levels which are consistent across different outcomes while effectively controlling the type I error rate. Next, we derive an alternative closed-form expression for the estimation of the number of non-redundant metabolic variates based on the spectral decomposition of their correlation matrix. The performance of the method is tested for different model parametrizations and across a wide range of correlation levels of the variates using synthetic and real data sets. RESULTS: Both the permutation-based formulation and the more practical closed form expression are found to give an effective indication of the number of independent metabolic effects exhibited by the system, while guaranteeing that the derived adjusted threshold is stable across outcome measures with diverse properties.


Subject(s)
Metabolome , Metabolomics , Models, Biological , Genetic Markers/genetics , Metabolomics/methods , Statistical Distributions
12.
Metabolites ; 10(12)2020 Dec 04.
Article in English | MEDLINE | ID: mdl-33291639

ABSTRACT

BACKGROUND: Overweight and obesity amongst women of reproductive age are increasingly common in developed economies and are shown to adversely affect birth outcomes and both childhood and adulthood health risks in the offspring. Metabolic profiling in conditions of overweight and obesity in pregnancy could potentially be applied to elucidate the molecular basis of the adverse effects of gestational weight gain (GWG) and postpartum weight loss (WL) on future risks for cardiovascular disease (CVD) and other chronic diseases. METHODS: Biofluid samples were collected from 114 ethnically diverse pregnant women with body mass index (BMI) between 25 and 40 kg/m2 from Chicago (US), as part of a randomized lifestyle intervention trial (Maternal Offspring Metabolics: Family Intervention Trial; NCT01631747). At 15 weeks, 35 weeks of gestation, and at 1 year postpartum, the blood plasma lipidome and metabolic profile of urine samples were analyzed by liquid chromatography mass spectrometry (LC-MS) and 1H nuclear magnetic resonance spectroscopy (1H NMR) respectively. RESULTS: Urinary 4-deoxyerythronic acid and 4-deoxythreonic acid were found to be positively correlated to BMI. Seventeen plasma lipids were found to be associated with GWG and 16 lipids were found to be associated with WL, which included phosphatidylinositols (PI), phosphatidylcholines (PC), lysophospholipids (lyso-), sphingomyelins (SM) and ether phosphatidylcholine (PC-O). Three phospholipids found to be positively associated with GWG all contained palmitate side-chains, and amongst the 14 lipids that were negatively associated with GWG, seven were PC-O. Six of eight lipids found to be negatively associated with WL contained an 18:2 fatty acid side-chain. CONCLUSIONS: Maternal obesity was associated with characteristic urine and plasma metabolic phenotypes, and phospholipid profile was found to be associated with both GWG and postpartum WL in metabolically healthy pregnant women with overweight/obesity. Postpartum WL may be linked to the reduction in the intake of linoleic acid/conjugated linoleic acid food sources in our study population.

13.
BMC Bioinformatics ; 21(1): 11, 2020 Jan 09.
Article in English | MEDLINE | ID: mdl-31918658

ABSTRACT

BACKGROUND: Metabolomics time-course experiments provide the opportunity to understand the changes to an organism by observing the evolution of metabolic profiles in response to internal or external stimuli. Along with other omic longitudinal profiling technologies, these techniques have great potential to uncover complex relations between variations across diverse omic variables and provide unique insights into the underlying biology of the system. However, many statistical methods currently used to analyse short time-series omic data are i) prone to overfitting, ii) do not fully take into account the experimental design or iii) do not make full use of the multivariate information intrinsic to the data or iv) are unable to uncover multiple associations between different omic data. The model we propose is an attempt to i) overcome overfitting by using a weakly informative Bayesian model, ii) capture experimental design conditions through a mixed-effects model, iii) model interdependencies between variables by augmenting the mixed-effects model with a conditional auto-regressive (CAR) component and iv) identify potential associations between heterogeneous omic variables by using a horseshoe prior. RESULTS: We assess the performance of our model on synthetic and real datasets and show that it can outperform comparable models for metabolomic longitudinal data analysis. In addition, our proposed method provides the analyst with new insights on the data as it is able to identify metabolic biomarkers related to treatment, infer perturbed pathways as a result of treatment and find significant associations with additional omic variables. We also show through simulation that our model is fairly robust against inaccuracies in metabolite assignments. On real data, we demonstrate that the number of profiled metabolites slightly affects the predictive ability of the model. CONCLUSIONS: Our single model approach to longitudinal analysis of metabolomics data provides an approach simultaneously for integrative analysis and biomarker discovery. In addition, it lends better interpretation by allowing analysis at the pathway level. An accompanying R package for the model has been developed using the probabilistic programming language Stan. The package offers user-friendly functions for simulating data, fitting the model, assessing model fit and postprocessing the results. The main aim of the R package is to offer freely accessible resources for integrative longitudinal analysis for metabolomics scientists and various visualization functions easy-to-use for applied researchers to interpret results.


Subject(s)
Biomarkers/metabolism , Metabolomics/methods , Models, Theoretical , Bacteria/metabolism , Bayes Theorem , Metabolome
14.
Am J Clin Nutr ; 111(2): 280-290, 2020 02 01.
Article in English | MEDLINE | ID: mdl-31782492

ABSTRACT

BACKGROUND: Results from observational studies regarding associations between fish (including shellfish) intake and cardiovascular disease risk factors, including blood pressure (BP) and BMI, are inconsistent. OBJECTIVE: To investigate associations of fish consumption and associated urinary metabolites with BP and BMI in free-living populations. METHODS: We used cross-sectional data from the International Study of Macro-/Micronutrients and Blood Pressure (INTERMAP), including 4680 men and women (40-59 y) from Japan, China, the United Kingdom, and United States. Dietary intakes were assessed by four 24-h dietary recalls and BP from 8 measurements. Urinary metabolites (2 timed 24-h urinary samples) associated with fish intake acquired from NMR spectroscopy were identified. Linear models were used to estimate BP and BMI differences across categories of intake and per 2 SD higher intake of fish and its biomarkers. RESULTS: No significant associations were observed between fish intake and BP. There was a direct association with fish intake and BMI in the Japanese population sample (P trend = 0.03; fully adjusted model). In Japan, trimethylamine-N-oxide (TMAO) and taurine, respectively, demonstrated area under the receiver operating characteristic curve (AUC) values of 0.81 and 0.78 in discriminating high against low fish intake, whereas homarine (a metabolite found in shellfish muscle) demonstrated an AUC of 0.80 for high/nonshellfish intake. Direct associations were observed between urinary TMAO and BMI for all regions except Japan (P < 0.0001) and in Western populations between TMAO and BP (diastolic blood pressure: mean difference 1.28; 95% CI: 0.55, 2.02 mmHg; P = 0.0006, systolic blood pressure: mean difference 1.67; 95% CI: 0.60, 2.73 mmHg; P = 0.002). CONCLUSIONS: Urinary TMAO showed a stronger association with fish intake in the Japanese compared with the Western population sample. Urinary TMAO was directly associated with BP in the Western but not the Japanese population sample. Associations between fish intake and its biomarkers and downstream associations with BP/BMI appear to be context specific. INTERMAP is registered at www.clinicaltrials.gov as NCT00005271.


Subject(s)
Cardiovascular Diseases/prevention & control , Diet , Fishes , Adult , Animals , Blood Pressure/physiology , Body Mass Index , Cardiovascular Diseases/urine , Female , Humans , Male , Middle Aged , Risk Factors
15.
BMC Bioinformatics ; 20(1): 543, 2019 Nov 04.
Article in English | MEDLINE | ID: mdl-31684857

ABSTRACT

BACKGROUND: Transcriptomic data is often used to build statistical models which are predictive of a given phenotype, such as disease status. Genes work together in pathways and it is widely thought that pathway representations will be more robust to noise in the gene expression levels. We aimed to test this hypothesis by constructing models based on either genes alone, or based on sample specific scores for each pathway, thus transforming the data to a 'pathway space'. We progressively degraded the raw data by addition of noise and examined the ability of the models to maintain predictivity. RESULTS: Models in the pathway space indeed had higher predictive robustness than models in the gene space. This result was independent of the workflow, parameters, classifier and data set used. Surprisingly, randomised pathway mappings produced models of similar accuracy and robustness to true mappings, suggesting that the success of pathway space models is not conferred by the specific definitions of the pathway. Instead, predictive models built on the true pathway mappings led to prediction rules with fewer influential pathways than those built on randomised pathways. The extent of this effect was used to differentiate pathway collections coming from a variety of widely used pathway databases. CONCLUSIONS: Prediction models based on pathway scores are more robust to degradation of gene expression information than the equivalent models based on ungrouped genes. While models based on true pathway scores are not more robust or accurate than those based on randomised pathways, true pathways produced simpler prediction rules, emphasizing a smaller number of pathways.


Subject(s)
Computational Biology/methods , Gene Expression Profiling , Signal Transduction , Databases, Factual , Gene Expression , Humans , Models, Statistical , Phenotype , Transcriptome
16.
Methods Mol Biol ; 2037: 453-470, 2019.
Article in English | MEDLINE | ID: mdl-31463860

ABSTRACT

NMR data from large studies combining multiple cohorts is becoming common in large-scale metabolomics. The data size and combination of cohorts with diverse properties leads to special problems for data processing and analysis. These include alignment, normalization, detection and removal of outliers, presence of strong correlations, and the identification of unknowns. Nonetheless, these challenges can be addressed with suitable algorithms and techniques, leading to enhanced data sets ripe for further data mining.


Subject(s)
Algorithms , Biomarkers/analysis , Magnetic Resonance Spectroscopy/methods , Metabolic Networks and Pathways , Metabolomics/methods , Cohort Studies , Humans
17.
Nat Commun ; 10(1): 3041, 2019 07 10.
Article in English | MEDLINE | ID: mdl-31292445

ABSTRACT

Metabolomics is a widely used technology in academic research, yet its application to regulatory science has been limited. The most commonly cited barrier to its translation is lack of performance and reporting standards. The MEtabolomics standaRds Initiative in Toxicology (MERIT) project brings together international experts from multiple sectors to address this need. Here, we identify the most relevant applications for metabolomics in regulatory toxicology and develop best practice guidelines, performance and reporting standards for acquiring and analysing untargeted metabolomics and targeted metabolite data. We recommend that these guidelines are evaluated and implemented for several regulatory use cases.


Subject(s)
Environmental Pollution/legislation & jurisprudence , Metabolomics/standards , Practice Guidelines as Topic , Research Design/standards , Toxicology/standards , Environmental Monitoring/legislation & jurisprudence , Environmental Monitoring/methods , Environmental Pollution/prevention & control , Hazardous Substances/analysis , Hazardous Substances/toxicity , Humans , Metabolomics/legislation & jurisprudence , Toxicology/legislation & jurisprudence
18.
Eur Heart J ; 40(34): 2883-2896, 2019 09 07.
Article in English | MEDLINE | ID: mdl-31102408

ABSTRACT

AIMS: To characterize serum metabolic signatures associated with atherosclerosis in the coronary or carotid arteries and subsequently their association with incident cardiovascular disease (CVD). METHODS AND RESULTS: We used untargeted one-dimensional (1D) serum metabolic profiling by proton nuclear magnetic resonance spectroscopy (1H NMR) among 3867 participants from the Multi-Ethnic Study of Atherosclerosis (MESA), with replication among 3569 participants from the Rotterdam and LOLIPOP studies. Atherosclerosis was assessed by coronary artery calcium (CAC) and carotid intima-media thickness (IMT). We used multivariable linear regression to evaluate associations between NMR features and atherosclerosis accounting for multiplicity of comparisons. We then examined associations between metabolites associated with atherosclerosis and incident CVD available in MESA and Rotterdam and explored molecular networks through bioinformatics analyses. Overall, 30 1H NMR measured metabolites were associated with CAC and/or IMT, P = 1.3 × 10-14 to 1.0 × 10-6 (discovery) and P = 5.6 × 10-10 to 1.1 × 10-2 (replication). These associations were substantially attenuated after adjustment for conventional cardiovascular risk factors. Metabolites associated with atherosclerosis revealed disturbances in lipid and carbohydrate metabolism, branched chain, and aromatic amino acid metabolism, as well as oxidative stress and inflammatory pathways. Analyses of incident CVD events showed inverse associations with creatine, creatinine, and phenylalanine, and direct associations with mannose, acetaminophen-glucuronide, and lactate as well as apolipoprotein B (P < 0.05). CONCLUSION: Metabolites associated with atherosclerosis were largely consistent between the two vascular beds (coronary and carotid arteries) and predominantly tag pathways that overlap with the known cardiovascular risk factors. We present an integrated systems network that highlights a series of inter-connected pathways underlying atherosclerosis.


Subject(s)
Cardiovascular Diseases/etiology , Carotid Artery Diseases/complications , Carotid Artery Diseases/metabolism , Coronary Artery Disease/complications , Coronary Artery Disease/metabolism , Adult , Aged , Cardiovascular Diseases/blood , Carotid Artery Diseases/blood , Coronary Artery Disease/blood , Female , Humans , Male , Middle Aged , Prospective Studies , Proton Magnetic Resonance Spectroscopy
19.
Gigascience ; 8(2)2019 02 01.
Article in English | MEDLINE | ID: mdl-30535405

ABSTRACT

BACKGROUND: Metabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism's metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological, and many other applied biological domains. Its computationally intensive nature has driven requirements for open data formats, data repositories, and data analysis tools. However, the rapid progress has resulted in a mosaic of independent, and sometimes incompatible, analysis methods that are difficult to connect into a useful and complete data analysis solution. FINDINGS: PhenoMeNal (Phenome and Metabolome aNalysis) is an advanced and complete solution to set up Infrastructure-as-a-Service (IaaS) that brings workflow-oriented, interoperable metabolomics data analysis platforms into the cloud. PhenoMeNal seamlessly integrates a wide array of existing open-source tools that are tested and packaged as Docker containers through the project's continuous integration process and deployed based on a kubernetes orchestration framework. It also provides a number of standardized, automated, and published analysis workflows in the user interfaces Galaxy, Jupyter, Luigi, and Pachyderm. CONCLUSIONS: PhenoMeNal constitutes a keystone solution in cloud e-infrastructures available for metabolomics. PhenoMeNal is a unique and complete solution for setting up cloud e-infrastructures through easy-to-use web interfaces that can be scaled to any custom public and private cloud environment. By harmonizing and automating software installation and configuration and through ready-to-use scientific workflow user interfaces, PhenoMeNal has succeeded in providing scientists with workflow-driven, reproducible, and shareable metabolomics data analysis platforms that are interfaced through standard data formats, representative datasets, versioned, and have been tested for reproducibility and interoperability. The elastic implementation of PhenoMeNal further allows easy adaptation of the infrastructure to other application areas and 'omics research domains.


Subject(s)
Metabolomics/methods , Software , Cloud Computing , Humans , Workflow
20.
Metabolomics ; 14(5): 56, 2018.
Article in English | MEDLINE | ID: mdl-29606928

ABSTRACT

INTRODUCTION: To aid the development of better algorithms for [Formula: see text]H NMR data analysis, such as alignment or peak-fitting, it is important to characterise and model chemical shift changes caused by variation in pH. The number of protonation sites, a key parameter in the theoretical relationship between pH and chemical shift, is traditionally estimated from the molecular structure, which is often unknown in untargeted metabolomics applications. OBJECTIVE: We aim to use observed NMR chemical shift titration data to estimate the number of protonation sites for a range of urinary metabolites. METHODS: A pool of urine from healthy subjects was titrated in the range pH 2-12, standard [Formula: see text]H NMR spectra were acquired and positions of 51 peaks (corresponding to 32 identified metabolites) were recorded. A theoretical model of chemical shift was fit to the data using a Bayesian statistical framework, using model selection procedures in a Markov Chain Monte Carlo algorithm to estimate the number of protonation sites for each molecule. RESULTS: The estimated number of protonation sites was found to be correct for 41 out of 51 peaks. In some cases, the number of sites was incorrectly estimated, due to very close pKa values or a limited amount of data in the required pH range. CONCLUSIONS: Given appropriate data, it is possible to estimate the number of protonation sites for many metabolites typically observed in [Formula: see text]H NMR metabolomics without knowledge of the molecular structure. This approach may be a valuable resource for the development of future automated metabolite alignment, annotation and peak fitting algorithms.

SELECTION OF CITATIONS
SEARCH DETAIL
...