RESUMEN
Metabolomics data is typically scaled to a common reference like a constant volume of body fluid, a constant creatinine level, or a constant area under the spectrum. Such scaling of the data, however, may affect the selection of biomarkers and the biological interpretation of results in unforeseen ways. Here, we studied how both the outcome of hypothesis tests for differential metabolite concentration and the screening for multivariate metabolite signatures are affected by the choice of scale. To overcome this problem for metabolite signatures and to establish a scale-invariant biomarker discovery algorithm, we extended linear zero-sum regression to the logistic regression framework and showed in two applications to 1H NMR-based metabolomics data how this approach overcomes the scaling problem. Logistic zero-sum regression is available as an R package as well as a high-performance computing implementation that can be downloaded at https://github.com/rehbergT/zeroSum .
Asunto(s)
Algoritmos , Biomarcadores/sangre , Biomarcadores/orina , Metabolómica , Humanos , Espectroscopía de Resonancia MagnéticaRESUMEN
Reliable identification of features distinguishing biological groups of interest in urinary metabolite fingerprints requires the control of total metabolite abundance, which may vary significantly as the kidneys adjust the excretion of water and solutes to meet the homeostatic needs of the body. Failure to account for such variation may lead to misclassification and accumulation of missing data in case of less concentrated urine specimens. Here, different pre- and post-acquisition methods of normalization were compared systematically for their ability to recover features from liquid chromatography-mass spectrometry metabolite fingerprints of urine that allow distinction between patients with chronic kidney disease and healthy controls. Methods of normalization that were employed prior to analysis included dilution of urine specimens to either a fixed creatinine concentration or osmolality value. Post-acquisition normalization methods applied to chromatograms of 1:4 diluted urine specimens comprised normalization to creatinine, osmolality, and sum of all integrals. Dilution of urine specimens to a fixed creatinine concentration resulted not only in the least number of missing values, but it was also the only method allowing the unambiguous classification of urine specimens from healthy and diseased individuals. The robustness of classification could be confirmed for two independent patient cohorts of chronic kidney disease patients and yielded a shared set of 49 discriminant metabolite features. Graphical Abstract Dilution to a uniform creatinine concentration across urine specimens yields more comparable urinary metabolite fingerprints.