Variance Component Selection With Applications to Microbiome Taxonomic Data.

Zhai, Jing; Kim, Juhyun; Knox, Kenneth S; Twigg, Homer L; Zhou, Hua; Zhou, Jin J

Zhai, Jing; Kim, Juhyun; Knox, Kenneth S; Twigg, Homer L; Zhou, Hua; Zhou, Jin J.

Afiliação

Zhai J; Department of Epidemiology and Biostatistics, University of Arizona, Tucson, AZ, United States.
Kim J; Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA, United States.
Knox KS; Division of Pulmonary, Allergy, Critical Care, and Sleep Medicine, Department of Medicine, University of Arizona, Tucson, AZ, United States.
Twigg HL; Division of Pulmonary, Critical Care, Sleep, and Occupational Medicine, Indiana University Medical Center, Indianapolis, IN, United States.
Zhou H; Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA, United States.
Zhou JJ; Department of Epidemiology and Biostatistics, University of Arizona, Tucson, AZ, United States.

Front Microbiol ; 9: 509, 2018.

Article em En | MEDLINE | ID: mdl-29643839

RESUMO

High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Microbiome data are summarized as counts or composition of the bacterial taxa at different taxonomic levels. An important problem is to identify the bacterial taxa that are associated with a response. One method is to test the association of specific taxon with phenotypes in a linear mixed effect model, which incorporates phylogenetic information among bacterial communities. Another type of approaches consider all taxa in a joint model and achieves selection via penalization method, which ignores phylogenetic information. In this paper, we consider regression analysis by treating bacterial taxa at different level as multiple random effects. For each taxon, a kernel matrix is calculated based on distance measures in the phylogenetic tree and acts as one variance component in the joint model. Then taxonomic selection is achieved by the lasso (least absolute shrinkage and selection operator) penalty on variance components. Our method integrates biological information into the variable selection problem and greatly improves selection accuracies. Simulation studies demonstrate the superiority of our methods versus existing methods, for example, group-lasso. Finally, we apply our method to a longitudinal microbiome study of Human Immunodeficiency Virus (HIV) infected patients. We implement our method using the high performance computing language Julia. Software and detailed documentation are freely available at https://github.com/JingZhai63/VCselection.

Palavras-chave

Human Immunodeficiency Virus (HIV); MM-algorithm; lasso; longitudinal study; lung microbiome; variable selection; variance component models

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2018 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links