Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 110
Filtrar
1.
Anal Chem ; 96(29): 11707-11715, 2024 Jul 23.
Artículo en Inglés | MEDLINE | ID: mdl-38990576

RESUMEN

J-Resolved (J-Res) nuclear magnetic resonance (NMR) spectroscopy is pivotal in NMR-based metabolomics, but practitioners face a choice between time-consuming high-resolution (HR) experiments or shorter low-resolution (LR) experiments which exhibit significant peak overlap. Deep learning neural networks have been successfully used in many fields to enhance quality of natural images, especially with regard to resolution, and therefore offer the prospect of improving two-dimensional (2D) NMR data. Here, we introduce the J-RESRGAN, an adapted and modified generative adversarial network (GAN) for image super-resolution (SR), which we trained specifically for metabolomic J-Res spectra to enhance peak resolution. A novel symmetric loss function was introduced, exploiting the inherent vertical symmetry of J-Res NMR spectra. Model training used simulated high-resolution J-Res spectra of complex mixtures, with corresponding low-resolution spectra generated via blurring and down-sampling. Evaluation of peak pair resolvability on J-RESRGAN demonstrated remarkable improvement in resolution across a variety of samples. In simulated plasma data, 100% of peak pairs exhibited enhanced resolution in super-resolution spectra compared to their low-resolution counterparts. Similarly, enhanced resolution was observed in 80.8-100% of peak pairs in experimental plasma, 85.0-96.7% in urine, 94.4-98.9% in full fat milk, and 82.6-91.7% in orange juice. J-RESRGAN is not sample type, spectrometer or field strength dependent and improvements on previously acquired data can be seen in seconds on a standard desktop computer. We believe this demonstrates the promise of deep learning methods to enhance NMR metabolomic data, and in particular, the power of J-RESRGAN to elucidate overlapping peaks, advancing precision in a wide variety of NMR-based metabolomics studies. The model, J-RESRGAN, is openly accessible for download on GitHub at https://github.com/yanyan5420/J-RESRGAN.


Asunto(s)
Aprendizaje Profundo , Espectroscopía de Resonancia Magnética , Metabolómica , Metabolómica/métodos , Espectroscopía de Resonancia Magnética/métodos , Animales , Humanos
2.
PLoS Comput Biol ; 20(3): e1011814, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38527092

RESUMEN

As terabytes of multi-omics data are being generated, there is an ever-increasing need for methods facilitating the integration and interpretation of such data. Current multi-omics integration methods typically output lists, clusters, or subnetworks of molecules related to an outcome. Even with expert domain knowledge, discerning the biological processes involved is a time-consuming activity. Here we propose PathIntegrate, a method for integrating multi-omics datasets based on pathways, designed to exploit knowledge of biological systems and thus provide interpretable models for such studies. PathIntegrate employs single-sample pathway analysis to transform multi-omics datasets from the molecular to the pathway-level, and applies a predictive single-view or multi-view model to integrate the data. Model outputs include multi-omics pathways ranked by their contribution to the outcome prediction, the contribution of each omics layer, and the importance of each molecule in a pathway. Using semi-synthetic data we demonstrate the benefit of grouping molecules into pathways to detect signals in low signal-to-noise scenarios, as well as the ability of PathIntegrate to precisely identify important pathways at low effect sizes. Finally, using COPD and COVID-19 data we showcase how PathIntegrate enables convenient integration and interpretation of complex high-dimensional multi-omics datasets. PathIntegrate is available as an open-source Python package.


Asunto(s)
Genómica , Multiómica , Genómica/métodos
3.
Bioinformatics ; 40(3)2024 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-38383048

RESUMEN

MOTIVATION: Random forests (RFs) can deal with a large number of variables, achieve reasonable prediction scores, and yield highly interpretable feature importance values. As such, RFs are appropriate models for feature selection and further dimension reduction. However, RFs are often not appropriate for correlated datasets due to their mode of selecting individual features for splitting. Addressing correlation relationships in high-dimensional datasets is imperative for reducing the number of variables that are assigned high importance, hence making the dimension reduction most efficient. Here, we propose the LAtent VAriable Stochastic Ensemble of Trees (LAVASET) method that derives latent variables based on the distance characteristics of each feature and aims to incorporate the correlation factor in the splitting step. RESULTS: Without compromising on performance in the majority of examples, LAVASET outperforms RF by accurately determining feature importance across all correlated variables and ensuring proper distribution of importance values. LAVASET yields mostly non-inferior prediction accuracies to traditional RFs when tested in simulated and real 1D datasets, as well as more complex and high-dimensional 3D datatypes. Unlike traditional RFs, LAVASET is unaffected by single 'important' noisy features (false positives), as it considers the local neighbourhood. LAVASET, therefore, highlights neighbourhoods of features, reflecting real signals that collectively impact the model's predictive ability. AVAILABILITY AND IMPLEMENTATION: LAVASET is freely available as a standalone package from https://github.com/melkasapi/LAVASET.

4.
PLoS Comput Biol ; 20(2): e1011381, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38386685

RESUMEN

Metabolic profiling (metabolomics) aims at measuring small molecules (metabolites) in complex samples like blood or urine for human health studies. While biomarker-based assessment often relies on a single molecule, metabolic profiling combines several metabolites to create a more complex and more specific fingerprint of the disease. However, in contrast to genomics, there is no unique metabolomics setup able to measure the entire metabolome. This challenge leads to tedious and resource consuming preliminary studies to be able to design the right metabolomics experiment. In that context, computer assisted metabolic profiling can be of strong added value to design metabolomics studies more quickly and efficiently. We propose a constraint-based modelling approach which predicts in silico profiles of metabolites that are more likely to be differentially abundant under a given metabolic perturbation (e.g. due to a genetic disease), using flux simulation. In genome-scale metabolic networks, the fluxes of exchange reactions, also known as the flow of metabolites through their external transport reactions, can be simulated and compared between control and disease conditions in order to calculate changes in metabolite import and export. These import/export flux differences would be expected to induce changes in circulating biofluid levels of those metabolites, which can then be interpreted as potential biomarkers or metabolites of interest. In this study, we present SAMBA (SAMpling Biomarker Analysis), an approach which simulates fluxes in exchange reactions following a metabolic perturbation using random sampling, compares the simulated flux distributions between the baseline and modulated conditions, and ranks predicted differentially exchanged metabolites as potential biomarkers for the perturbation. We show that there is a good fit between simulated metabolic exchange profiles and experimental differential metabolites detected in plasma, such as patient data from the disease database OMIM, and metabolic trait-SNP associations found in mGWAS studies. These biomarker recommendations can provide insight into the underlying mechanism or metabolic pathway perturbation lying behind observed metabolite differential abundances, and suggest new metabolites as potential avenues for further experimental analyses.


Asunto(s)
Metaboloma , Metabolómica , Humanos , Metaboloma/genética , Genoma , Redes y Vías Metabólicas , Biomarcadores
5.
bioRxiv ; 2024 Jan 09.
Artículo en Inglés | MEDLINE | ID: mdl-38260498

RESUMEN

As terabytes of multi-omics data are being generated, there is an ever-increasing need for methods facilitating the integration and interpretation of such data. Current multi-omics integration methods typically output lists, clusters, or subnetworks of molecules related to an outcome. Even with expert domain knowledge, discerning the biological processes involved is a time-consuming activity. Here we propose PathIntegrate, a method for integrating multi-omics datasets based on pathways, designed to exploit knowledge of biological systems and thus provide interpretable models for such studies. PathIntegrate employs single-sample pathway analysis to transform multi-omics datasets from the molecular to the pathway-level, and applies a predictive single-view or multi-view model to integrate the data. Model outputs include multi-omics pathways ranked by their contribution to the outcome prediction, the contribution of each omics layer, and the importance of each molecule in a pathway. Using semi-synthetic data we demonstrate the benefit of grouping molecules into pathways to detect signals in low signal-to-noise scenarios, as well as the ability of PathIntegrate to precisely identify important pathways at low effect sizes. Finally, using COPD and COVID-19 data we showcase how PathIntegrate enables convenient integration and interpretation of complex high-dimensional multi-omics datasets. The PathIntegrate Python package is available at https://github.com/cwieder/PathIntegrate.

6.
Am J Clin Nutr ; 118(5): 989-999, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37660929

RESUMEN

BACKGROUND: Whether red meat consumption is associated with higher inflammation or confounded by increased adiposity remains unclear. Plasma metabolites capture the effects of diet after food is processed, digested, and absorbed, and correlate with markers of inflammation, so they can help clarify diet-health relationships. OBJECTIVE: To identify whether any metabolites associated with red meat intake are also associated with inflammation. METHODS: A cross-sectional analysis of observational data from older adults (52.84% women, mean age 63 ± 0.3 y) participating in the Multi-Ethnic Study of Atherosclerosis (MESA). Dietary intake was assessed by food-frequency questionnaire, alongside C-reactive protein (CRP), interleukin-2, interleukin-6, fibrinogen, homocysteine, and tumor necrosis factor alpha, and untargeted proton nuclear magnetic resonance (1H NMR) metabolomic features. Associations between these variables were examined using linear regression models, adjusted for demographic factors, lifestyle behaviors, and body mass index (BMI). RESULTS: In analyses that adjust for BMI, neither processed nor unprocessed forms of red meat were associated with any markers of inflammation (all P > 0.01). However, when adjusting for BMI, unprocessed red meat was inversely associated with spectral features representing the metabolite glutamine (sentinel hit: ß = -0.09 ± 0.02, P = 2.0 × 10-5), an amino acid which was also inversely associated with CRP level (ß = -0.11 ± 0.01, P = 3.3 × 10-10). CONCLUSIONS: Our analyses were unable to support a relationship between either processed or unprocessed red meat and inflammation, over and above any confounding by BMI. Glutamine, a plasma correlate of lower unprocessed red meat intake, was associated with lower CRP levels. The differences in diet-inflammation associations, compared with diet metabolite-inflammation associations, warrant further investigation to understand the extent that these arise from the following: 1) a reduction in measurement error with metabolite measures; 2) the extent that which factors other than unprocessed red meat intake contribute to glutamine levels; and 3) the ability of plasma metabolites to capture individual differences in how food intake is metabolized.


Asunto(s)
Glutamina , Carne Roja , Humanos , Femenino , Anciano , Persona de Mediana Edad , Masculino , Estudios Transversales , Inflamación , Dieta , Carne , Factores de Riesgo
7.
J Nutr ; 153(10): 2797-2807, 2023 10.
Artículo en Inglés | MEDLINE | ID: mdl-37562669

RESUMEN

BACKGROUND: Avocado consumption is linked to better glucose homeostasis, but small associations suggest potential population heterogeneity. Metabolomic data capture the effects of food intake after digestion and metabolism, thus accounting for individual differences in these processes. OBJECTIVES: To identify metabolomic biomarkers of avocado intake and to examine their associations with glycemia. METHODS: Baseline data from 6224 multi-ethnic older adults (62% female) included self-reported avocado intake, fasting glucose and insulin, and untargeted plasma proton nuclear magnetic resonance metabolomic features (metabolomic data were available for a randomly selected subset; N = 3438). Subsequently, incident type 2 diabetes (T2D) was assessed over an ∼18 y follow-up period. A metabolome-wide association study of avocado consumption status (consumer compared with nonconsumer) was conducted, and the relationship of these features with glycemia via cross-sectional associations with fasting insulin and glucose and longitudinal associations with incident T2D was examined. RESULTS: Three highly-correlated spectral features were associated with avocado intake at metabolome-wide significance levels (P < 5.3 ∗ 10-7) and combined into a single biomarker. We did not find evidence that these features were additionally associated with overall dietary quality, nor with any of 47 other food groups (all P > 0.001), supporting their suitability as a biomarker of avocado intake. Avocado intake showed a modest association only with lower fasting insulin (ß = -0.07 +/- 0.03, P = 0.03), an association that was attenuated to nonsignificance when additionally controlling for body mass index (kg/m2). However, our biomarker of avocado intake was strongly associated with lower fasting glucose (ß = -0.22 +/- 0.02, P < 2.0 ∗ 10-16), lower fasting insulin (ß = -0.17 +/- 0.02, P < 2.0 ∗ 10-16), and a lower incidence of T2D (hazard ratio: 0.68; 0.63-074, P < 2.0 ∗ 10-16), even when adjusting for BMI. CONCLUSIONS: Highly significant associations between glycemia and avocado-related metabolomic features, which serve as biomarkers of the physiological impact of dietary intake after digestion and absorption, compared to modest relationships between glycemia and avocado consumption, highlights the importance of considering individual differences in metabolism when considering diet-health relationships.


Asunto(s)
Aterosclerosis , Diabetes Mellitus Tipo 2 , Persea , Humanos , Femenino , Anciano , Masculino , Diabetes Mellitus Tipo 2/epidemiología , Factores de Riesgo , Estudios Transversales , Biomarcadores , Insulina , Glucosa
8.
Curr Opin Chem Biol ; 74: 102288, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-36966702

RESUMEN

The computational metabolomics field brings together computer scientists, bioinformaticians, chemists, clinicians, and biologists to maximize the impact of metabolomics across a wide array of scientific and medical disciplines. The field continues to expand as modern instrumentation produces datasets with increasing complexity, resolution, and sensitivity. These datasets must be processed, annotated, modeled, and interpreted to enable biological insight. Techniques for visualization, integration (within or between omics), and interpretation of metabolomics data have evolved along with innovation in the databases and knowledge resources required to aid understanding. In this review, we highlight recent advances in the field and reflect on opportunities and innovations in response to the most pressing challenges. This review was compiled from discussions from the 2022 Dagstuhl seminar entitled "Computational Metabolomics: From Spectra to Knowledge".


Asunto(s)
Biología Computacional , Metabolómica , Metabolómica/métodos , Espectrometría de Masas/métodos , Bases de Datos Factuales , Biología Computacional/métodos
9.
Metabolomics ; 18(12): 102, 2022 12 05.
Artículo en Inglés | MEDLINE | ID: mdl-36469142

RESUMEN

BACKGROUND: Compound identification remains a critical bottleneck in the process of exploiting Nuclear Magnetic Resonance (NMR) metabolomics data, especially for 1H 1-dimensional (1H 1D) data. As databases of reference compound spectra have grown, workflows have evolved to rely heavily on their search functions to facilitate this process by generating lists of potential metabolites found in complex mixture data, facilitating annotation and identification. However, approaches for validating and communicating annotations are most often guided by expert knowledge, and therefore are highly variable despite repeated efforts to align practices and define community standards. AIM OF REVIEW: This review is aimed at broadening the application of automated annotation tools by discussing the key ideas of spectral matching and beginning to describe a set of terms to classify this information, thus advancing standards for communicating annotation confidence. Additionally, we hope that this review will facilitate the growing collaboration between chemical data scientists, software developers and the NMR metabolomics community aiding development of long-term software solutions. KEY SCIENTIFIC CONCEPTS OF REVIEW: We begin with a brief discussion of the typical untargeted NMR identification workflow. We differentiate between annotation (hypothesis generation, filtering), and identification (hypothesis testing, verification), and note the utility of different NMR data features for annotation. We then touch on three parts of annotation: (1) generation of queries, (2) matching queries to reference data, and (3) scoring and confidence estimation of potential matches for verification. In doing so, we highlight existing approaches to automated and semi-automated annotation from the perspective of the structural information they utilize, as well as how this information can be represented computationally.


Asunto(s)
Metabolómica , Programas Informáticos , Metabolómica/métodos , Espectroscopía de Resonancia Magnética/métodos , Imagen por Resonancia Magnética , Bases de Datos Factuales
10.
BMC Bioinformatics ; 23(1): 481, 2022 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-36376837

RESUMEN

BACKGROUND: Single sample pathway analysis (ssPA) transforms molecular level omics data to the pathway level, enabling the discovery of patient-specific pathway signatures. Compared to conventional pathway analysis, ssPA overcomes the limitations by enabling multi-group comparisons, alongside facilitating numerous downstream analyses such as pathway-based machine learning. While in transcriptomics ssPA is a widely used technique, there is little literature evaluating its suitability for metabolomics. Here we provide a benchmark of established ssPA methods (ssGSEA, GSVA, SVD (PLAGE), and z-score) alongside the evaluation of two novel methods we propose: ssClustPA and kPCA, using semi-synthetic metabolomics data. We then demonstrate how ssPA can facilitate pathway-based interpretation of metabolomics data by performing a case-study on inflammatory bowel disease mass spectrometry data, using clustering to determine subtype-specific pathway signatures. RESULTS: While GSEA-based and z-score methods outperformed the others in terms of recall, clustering/dimensionality reduction-based methods provided higher precision at moderate-to-high effect sizes. A case study applying ssPA to inflammatory bowel disease data demonstrates how these methods yield a much richer depth of interpretation than conventional approaches, for example by clustering pathway scores to visualise a pathway-based patient subtype-specific correlation network. We also developed the sspa python package (freely available at https://pypi.org/project/sspa/ ), providing implementations of all the methods benchmarked in this study. CONCLUSION: This work underscores the value ssPA methods can add to metabolomic studies and provides a useful reference for those wishing to apply ssPA methods to metabolomics data.


Asunto(s)
Enfermedades Inflamatorias del Intestino , Metabolómica , Humanos , Metabolómica/métodos , Transcriptoma , Análisis por Conglomerados , Espectrometría de Masas
12.
Int J Cancer ; 151(12): 2115-2127, 2022 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-35866293

RESUMEN

Prostate cancer (PCa) is the most common cancer form in males in many European and American countries, but there are still open questions regarding its etiology. Untargeted metabolomics can produce an unbiased global metabolic profile, with the opportunity for uncovering new plasma metabolites prospectively associated with risk of PCa, providing insights into disease etiology. We conducted a prospective untargeted liquid chromatography-mass spectrometry (LC-MS) metabolomics analysis using prediagnostic fasting plasma samples from 752 PCa case-control pairs nested within the Northern Sweden Health and Disease Study (NSHDS). The pairs were matched by age, BMI, and sample storage time. Discriminating features were identified by a combination of orthogonal projection to latent structures-effect projections (OPLS-EP) and Wilcoxon signed-rank tests. Their prospective associations with PCa risk were investigated by conditional logistic regression. Subgroup analyses based on stratification by disease aggressiveness and baseline age were also conducted. Various free fatty acids and phospholipids were positively associated with overall risk of PCa and in various stratification subgroups. Aromatic amino acids were positively associated with overall risk of PCa. Uric acid was positively, and glucose negatively, associated with risk of PCa in the older subgroup. This is the largest untargeted LC-MS based metabolomics study to date on plasma metabolites prospectively associated with risk of developing PCa. Different subgroups of disease aggressiveness and baseline age showed different associations with metabolites. The findings suggest that shifts in plasma concentrations of metabolites in lipid, aromatic amino acid, and glucose metabolism are associated with risk of developing PCa during the following two decades.


Asunto(s)
Ácidos Grasos no Esterificados , Neoplasias de la Próstata , Masculino , Humanos , Estudios de Casos y Controles , Ácido Úrico , Suecia/epidemiología , Metabolómica/métodos , Espectrometría de Masas , Neoplasias de la Próstata/diagnóstico , Neoplasias de la Próstata/epidemiología , Aminoácidos Aromáticos , Glucosa
13.
Anal Chem ; 94(14): 5493-5503, 2022 04 12.
Artículo en Inglés | MEDLINE | ID: mdl-35360896

RESUMEN

Integration of multiple datasets can greatly enhance bioanalytical studies, for example, by increasing power to discover and validate biomarkers. In liquid chromatography-mass spectrometry (LC-MS) metabolomics, it is especially hard to combine untargeted datasets since the majority of metabolomic features are not annotated and thus cannot be matched by chemical identity. Typically, the information available for each feature is retention time (RT), mass-to-charge ratio (m/z), and feature intensity (FI). Pairs of features from the same metabolite in separate datasets can exhibit small but significant differences, making matching very challenging. Current methods to address this issue are too simple or rely on assumptions that cannot be met in all cases. We present a method to find feature correspondence between two similar LC-MS metabolomics experiments or batches using only the features' RT, m/z, and FI. We demonstrate the method on both real and synthetic datasets, using six orthogonal validation strategies to gauge the matching quality. In our main example, 4953 features were uniquely matched, of which 585 (96.8%) of 604 manually annotated features were correct. In a second example, 2324 features could be uniquely matched, with 79 (90.8%) out of 87 annotated features correctly matched. Most of the missed annotated matches are between features that behave very differently from modeled inter-dataset shifts of RT, MZ, and FI. In a third example with simulated data with 4755 features per dataset, 99.6% of the matches were correct. Finally, the results of matching three other dataset pairs using our method are compared with a published alternative method, metabCombiner, showing the advantages of our approach. The method can be applied using M2S (Match 2 Sets), a free, open-source MATLAB toolbox, available at https://github.com/rjdossan/M2S.


Asunto(s)
Metabolómica , Biomarcadores/análisis , Cromatografía Liquida/métodos , Espectrometría de Masas/métodos , Metabolómica/métodos
14.
Am J Clin Nutr ; 116(1): 216-229, 2022 07 06.
Artículo en Inglés | MEDLINE | ID: mdl-35285859

RESUMEN

BACKGROUND: Adherence to the Dietary Approaches to Stop Hypertension (DASH) diet enhances potassium intake and reduces sodium intake and blood pressure (BP), but the underlying metabolic pathways are unclear. OBJECTIVES: Among free-living populations, we delineated metabolic signatures associated with the DASH diet adherence, 24-hour urinary sodium and potassium excretions, and the potential metabolic pathways involved. METHODS: We used 24-hour urinary metabolic profiling by proton nuclear magnetic resonance spectroscopy to characterize the metabolic signatures associated with the DASH dietary pattern score (DASH score) and 24-hour excretion of sodium and potassium among participants in the United States (n = 2164) and United Kingdom (n = 496) enrolled in the International Study of Macro- and Micronutrients and Blood Pressure (INTERMAP). Multiple linear regression and cross-tabulation analyses were used to investigate the DASH-BP relation and its modulation by sodium and potassium. Potential pathways associated with DASH adherence, sodium and potassium excretion, and BP were identified using mediation analyses and metabolic reaction networks. RESULTS: Adherence to the DASH diet was associated with urinary potassium excretion (correlation coefficient, r = 0.42; P < 0.0001). In multivariable regression analyses, a 5-point higher DASH score (range, 7 to 35) was associated with a lower systolic BP by 1.35 mmHg (95% CI, -1.95 to -0.80 mmHg; P = 1.2 × 10-5); control of the model for potassium but not sodium attenuated the DASH-BP relation. Two common metabolites (hippurate and citrate) mediated the potassium-BP and DASH-BP relationships, while 5 metabolites (succinate, alanine, S-methyl cysteine sulfoxide, 4-hydroxyhippurate, and phenylacetylglutamine) were found to be specific to the DASH-BP relation. CONCLUSIONS: Greater adherence to the DASH diet is associated with lower BP and higher potassium intake across levels of sodium intake. The DASH diet recommends greater intake of fruits, vegetables, and other potassium-rich foods that may replace sodium-rich processed foods and thereby influence BP through overlapping metabolic pathways. Possible DASH-specific pathways are speculated but confirmation requires further study. INTERMAP is registered as NCT00005271 at www.clinicaltrials.gov.


Asunto(s)
Enfoques Dietéticos para Detener la Hipertensión , Hipertensión , Sodio en la Dieta , Presión Sanguínea/fisiología , Humanos , Micronutrientes , Potasio , Sodio
15.
Anal Chem ; 94(8): 3446-3455, 2022 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-35180347

RESUMEN

Untargeted metabolomics and lipidomics LC-MS experiments produce complex datasets, usually containing tens of thousands of features from thousands of metabolites whose annotation requires additional MS/MS experiments and expert knowledge. All-ion fragmentation (AIF) LC-MS/MS acquisition provides fragmentation data at no additional experimental time cost. However, analysis of such datasets requires reconstruction of parent-fragment relationships and annotation of the resulting pseudo-MS/MS spectra. Here, we propose a novel approach for automated annotation of isotopologues, adducts, and in-source fragments from AIF LC-MS datasets by combining correlation-based parent-fragment linking with molecular fragment matching. Our workflow focuses on a subset of features rather than trying to annotate the full dataset, saving time and simplifying the process. We demonstrate the workflow in three human serum datasets containing 599 features manually annotated by experts. Precision and recall values of 82-92% and 82-85%, respectively, were obtained for features found in the highest-rank scores (1-5). These results equal or outperform those obtained using MS-DIAL software, the current state of the art for AIF data annotation. Further validation for other biological matrices and different instrument types showed variable precision (60-89%) and recall (10-88%) particularly for datasets dominated by nonlipid metabolites. The workflow is freely available as an open-source R package, MetaboAnnotatoR, together with the fragment libraries from Github (https://github.com/gggraca/MetaboAnnotatoR).


Asunto(s)
Metabolómica , Espectrometría de Masas en Tándem , Cromatografía Liquida/métodos , Humanos , Metabolómica/métodos , Programas Informáticos , Espectrometría de Masas en Tándem/métodos , Flujo de Trabajo
16.
PLoS Comput Biol ; 17(9): e1009105, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34492007

RESUMEN

Over-representation analysis (ORA) is one of the commonest pathway analysis approaches used for the functional interpretation of metabolomics datasets. Despite the widespread use of ORA in metabolomics, the community lacks guidelines detailing its best-practice use. Many factors have a pronounced impact on the results, but to date their effects have received little systematic attention. Using five publicly available datasets, we demonstrated that changes in parameters such as the background set, differential metabolite selection methods, and pathway database used can result in profoundly different ORA results. The use of a non-assay-specific background set, for example, resulted in large numbers of false-positive pathways. Pathway database choice, evaluated using three of the most popular metabolic pathway databases (KEGG, Reactome, and BioCyc), led to vastly different results in both the number and function of significantly enriched pathways. Factors that are specific to metabolomics data, such as the reliability of compound identification and the chemical bias of different analytical platforms also impacted ORA results. Simulated metabolite misidentification rates as low as 4% resulted in both gain of false-positive pathways and loss of truly significant pathways across all datasets. Our results have several practical implications for ORA users, as well as those using alternative pathway analysis methods. We offer a set of recommendations for the use of ORA in metabolomics, alongside a set of minimal reporting guidelines, as a first step towards the standardisation of pathway analysis in metabolomics.


Asunto(s)
Metabolómica , Biología Computacional/métodos , Conjuntos de Datos como Asunto , Redes y Vías Metabólicas , Reproducibilidad de los Resultados
17.
Regul Toxicol Pharmacol ; 125: 105020, 2021 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-34333066

RESUMEN

Omics methodologies are widely used in toxicological research to understand modes and mechanisms of toxicity. Increasingly, these methodologies are being applied to questions of regulatory interest such as molecular point-of-departure derivation and chemical grouping/read-across. Despite its value, widespread regulatory acceptance of omics data has not yet occurred. Barriers to the routine application of omics data in regulatory decision making have been: 1) lack of transparency for data processing methods used to convert raw data into an interpretable list of observations; and 2) lack of standardization in reporting to ensure that omics data, associated metadata and the methodologies used to generate results are available for review by stakeholders, including regulators. Thus, in 2017, the Organisation for Economic Co-operation and Development (OECD) Extended Advisory Group on Molecular Screening and Toxicogenomics (EAGMST) launched a project to develop guidance for the reporting of omics data aimed at fostering further regulatory use. Here, we report on the ongoing development of the first formal reporting framework describing the processing and analysis of both transcriptomic and metabolomic data for regulatory toxicology. We introduce the modular structure, content, harmonization and strategy for trialling this reporting framework prior to its publication by the OECD.


Asunto(s)
Metabolómica/normas , Organización para la Cooperación y el Desarrollo Económico/normas , Toxicogenética/normas , Toxicología/normas , Transcriptoma/fisiología , Documentación/normas , Humanos
18.
Nat Protoc ; 16(9): 4299-4326, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34321638

RESUMEN

Metabolic phenotyping is an important tool in translational biomedical research. The advanced analytical technologies commonly used for phenotyping, including mass spectrometry (MS) and nuclear magnetic resonance (NMR) spectroscopy, generate complex data requiring tailored statistical analysis methods. Detailed protocols have been published for data acquisition by liquid NMR, solid-state NMR, ultra-performance liquid chromatography (LC-)MS and gas chromatography (GC-)MS on biofluids or tissues and their preprocessing. Here we propose an efficient protocol (guidelines and software) for statistical analysis of metabolic data generated by these methods. Code for all steps is provided, and no prior coding skill is necessary. We offer efficient solutions for the different steps required within the complete phenotyping data analytics workflow: scaling, normalization, outlier detection, multivariate analysis to explore and model study-related effects, selection of candidate biomarkers, validation, multiple testing correction and performance evaluation of statistical models. We also provide a statistical power calculation algorithm and safeguards to ensure robust and meaningful experimental designs that deliver reliable results. We exemplify the protocol with a two-group classification study and data from an epidemiological cohort; however, the protocol can be easily modified to cover a wider range of experimental designs or incorporate different modeling approaches. This protocol describes a minimal set of analyses needed to rigorously investigate typical datasets encountered in metabolic phenotyping.


Asunto(s)
Técnicas Genéticas , Metabolómica/métodos , Fenotipo , Programas Informáticos , Estadística como Asunto , Humanos , Metabolismo
19.
BMC Bioinformatics ; 22(1): 67, 2021 Feb 12.
Artículo en Inglés | MEDLINE | ID: mdl-33579202

RESUMEN

BACKGROUND: The search for statistically significant relationships between molecular markers and outcomes is challenging when dealing with high-dimensional, noisy and collinear multivariate omics data, such as metabolomic profiles. Permutation procedures allow for the estimation of adjusted significance levels without assuming independence among metabolomic variables. Nevertheless, the complex non-normal structure of metabolic profiles and outcomes may bias the permutation results leading to overly conservative threshold estimates i.e. lower than those from a Bonferroni or Sidak correction. METHODS: Within a univariate permutation procedure we employ parametric simulation methods based on the multivariate (log-)Normal distribution to obtain adjusted significance levels which are consistent across different outcomes while effectively controlling the type I error rate. Next, we derive an alternative closed-form expression for the estimation of the number of non-redundant metabolic variates based on the spectral decomposition of their correlation matrix. The performance of the method is tested for different model parametrizations and across a wide range of correlation levels of the variates using synthetic and real data sets. RESULTS: Both the permutation-based formulation and the more practical closed form expression are found to give an effective indication of the number of independent metabolic effects exhibited by the system, while guaranteeing that the derived adjusted threshold is stable across outcome measures with diverse properties.


Asunto(s)
Metaboloma , Metabolómica , Modelos Biológicos , Marcadores Genéticos/genética , Metabolómica/métodos , Distribuciones Estadísticas
20.
Med Princ Pract ; 30(4): 301-310, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33271569

RESUMEN

Metabolomics encompasses the systematic identification and quantification of all metabolic products in the human body. This field could provide clinicians with novel sets of diagnostic biomarkers for disease states in addition to quantifying treatment response to medications at an individualized level. This literature review aims to highlight the technology underpinning metabolic profiling, identify potential applications of metabolomics in clinical practice, and discuss the translational challenges that the field faces. We searched PubMed, MEDLINE, and EMBASE for primary and secondary research articles regarding clinical applications of metabolomics. Metabolic profiling can be performed using mass spectrometry and nuclear magnetic resonance-based techniques using a variety of biological samples. This is carried out in vivo or in vitro following careful sample collection, preparation, and analysis. The potential clinical applications constitute disruptive innovations in their respective specialities, particularly oncology and metabolic medicine. Outstanding issues currently preventing widespread clinical use are scalability of data interpretation, standardization of sample handling practice, and e-infrastructure. Routine utilization of metabolomics at a patient and population level will constitute an integral part of future healthcare provision.


Asunto(s)
Metabolómica , Medicina de Precisión , Estetoscopios , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...