Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data.

Fan, Sili; Kind, Tobias; Cajka, Tomas; Hazen, Stanley L; Tang, W H Wilson; Kaddurah-Daouk, Rima; Irvin, Marguerite R; Arnett, Donna K; Barupal, Dinesh K; Fiehn, Oliver

Fan, Sili; Kind, Tobias; Cajka, Tomas; Hazen, Stanley L; Tang, W H Wilson; Kaddurah-Daouk, Rima; Irvin, Marguerite R; Arnett, Donna K; Barupal, Dinesh K; Fiehn, Oliver.

Afiliação

Fan S; West Coast Metabolomics Center, UC Davis Genome Center , University of California, Davis , 451 Health Sciences Drive , Davis , California 95616 , United States.
Kind T; West Coast Metabolomics Center, UC Davis Genome Center , University of California, Davis , 451 Health Sciences Drive , Davis , California 95616 , United States.
Cajka T; West Coast Metabolomics Center, UC Davis Genome Center , University of California, Davis , 451 Health Sciences Drive , Davis , California 95616 , United States.
Hazen SL; Department of Metabolomics , Institute of Physiology CAS , Videnska 1083 , 14220 Prague , Czech Republic.
Irvin MR; Department of Psychiatry and Behavioral Sciences, Department of Medicine and the Duke Institute for Brain Sciences , Duke University , Durham , North Carolina 27708 , United States.
Arnett DK; Department of Epidemiology , University of Alabama at Birmingham , 1720 Second Avenue South , Birmingham , Alabama 35294 , United States.
Barupal DK; College of Public Health , University of Kentucky , 121 Washington Avenue , Lexington , Kentucky 40508 , United States.
Fiehn O; West Coast Metabolomics Center, UC Davis Genome Center , University of California, Davis , 451 Health Sciences Drive , Davis , California 95616 , United States.

Anal Chem ; 91(5): 3590-3596, 2019 03 05.

Article em En | MEDLINE | ID: mdl-30758187

RESUMO

Large-scale untargeted lipidomics experiments involve the measurement of hundreds to thousands of samples. Such data sets are usually acquired on one instrument over days or weeks of analysis time. Such extensive data acquisition processes introduce a variety of systematic errors, including batch differences, longitudinal drifts, or even instrument-to-instrument variation. Technical data variance can obscure the true biological signal and hinder biological discoveries. To combat this issue, we present a novel normalization approach based on using quality control pool samples (QC). This method is called systematic error removal using random forest (SERRF) for eliminating the unwanted systematic variations in large sample sets. We compared SERRF with 15 other commonly used normalization methods using six lipidomics data sets from three large cohort studies (832, 1162, and 2696 samples). SERRF reduced the average technical errors for these data sets to 5% relative standard deviation. We conclude that SERRF outperforms other existing methods and can significantly reduce the unwanted systematic variation, revealing biological variance of interest.

Assuntos

Conjuntos de Dados como Assunto/normas; Lipidômica/normas; Controle de Qualidade; Erro Científico Experimental/estatística & dados numéricos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Controle de Qualidade / Conjuntos de Dados como Assunto / Lipidômica Tipo de estudo: Clinical_trials / Observational_studies Idioma: En Revista: Anal Chem Ano de publicação: 2019 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google