Your browser doesn't support javascript.
loading
Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies.
Do, Kieu Trinh; Wahl, Simone; Raffler, Johannes; Molnos, Sophie; Laimighofer, Michael; Adamski, Jerzy; Suhre, Karsten; Strauch, Konstantin; Peters, Annette; Gieger, Christian; Langenberg, Claudia; Stewart, Isobel D; Theis, Fabian J; Grallert, Harald; Kastenmüller, Gabi; Krumsiek, Jan.
Afiliação
  • Do KT; Institute of Computational Biology, Helmholtz-Zentrum München, Neuherberg, Germany.
  • Wahl S; Institute of Epidemiology II, German Research Center for Environmental Health, Helmholtz Zentrum München, Neuherberg, Germany.
  • Raffler J; Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, Helmholtz Zentrum München, Neuherberg, Germany.
  • Molnos S; German Center for Diabetes Research (DZD e.V.), Neuherberg, Germany.
  • Laimighofer M; Institute of Bioinformatics and Systems Biology, Helmholtz-Zentrum München, Neuherberg, Germany.
  • Adamski J; Institute of Epidemiology II, German Research Center for Environmental Health, Helmholtz Zentrum München, Neuherberg, Germany.
  • Suhre K; Research Unit of Molecular Epidemiology, German Research Center for Environmental Health, Helmholtz Zentrum München, Neuherberg, Germany.
  • Strauch K; German Center for Diabetes Research (DZD e.V.), Neuherberg, Germany.
  • Peters A; Institute of Computational Biology, Helmholtz-Zentrum München, Neuherberg, Germany.
  • Gieger C; Institute of Experimental Genetics, Genome Analysis Center, Helmholtz Zentrum München, Neuherberg, Germany.
  • Langenberg C; Lehrstuhl für Experimentelle Genetik, Technische Universität München, Freising, Germany.
  • Stewart ID; German Center for Cardiovascular Disease Research (DZHK e.V.), Munich, Germany.
  • Theis FJ; Department of Physiology and Biophysics, Weill Cornell Medical College in Qatar, Education City, Doha, Qatar.
  • Grallert H; Institute of Genetic Epidemiology, Helmholtz Zentrum München-German Research Center for Environmental Health, Neuherberg, Germany.
  • Kastenmüller G; Chair of Genetic Epidemiology, Institute of Medical Informatics, Biometry and Epidemiology, Ludwig-Maximilians-University, Munich, Germany.
  • Krumsiek J; Institute of Epidemiology II, German Research Center for Environmental Health, Helmholtz Zentrum München, Neuherberg, Germany.
Metabolomics ; 14(10): 128, 2018 09 20.
Article em En | MEDLINE | ID: mdl-30830398
ABSTRACT

BACKGROUND:

Untargeted mass spectrometry (MS)-based metabolomics data often contain missing values that reduce statistical power and can introduce bias in biomedical studies. However, a systematic assessment of the various sources of missing values and strategies to handle these data has received little attention. Missing data can occur systematically, e.g. from run day-dependent effects due to limits of detection (LOD); or it can be random as, for instance, a consequence of sample preparation.

METHODS:

We investigated patterns of missing data in an MS-based metabolomics experiment of serum samples from the German KORA F4 cohort (n = 1750). We then evaluated 31 imputation methods in a simulation framework and biologically validated the results by applying all imputation approaches to real metabolomics data. We examined the ability of each method to reconstruct biochemical pathways from data-driven correlation networks, and the ability of the method to increase statistical power while preserving the strength of established metabolic quantitative trait loci.

RESULTS:

Run day-dependent LOD-based missing data accounts for most missing values in the metabolomics dataset. Although multiple imputation by chained equations performed well in many scenarios, it is computationally and statistically challenging. K-nearest neighbors (KNN) imputation on observations with variable pre-selection showed robust performance across all evaluation schemes and is computationally more tractable.

CONCLUSION:

Missing data in untargeted MS-based metabolomics data occur for various reasons. Based on our results, we recommend that KNN-based imputation is performed on observations with variable pre-selection since it showed robust results in all evaluation schemes.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Espectrometria de Massas / Metabolômica Idioma: En Ano de publicação: 2018 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Espectrometria de Massas / Metabolômica Idioma: En Ano de publicação: 2018 Tipo de documento: Article