A rarefaction-based extension of the LDM for testing presence-absence associations in the microbiome.

Hu, Yi-Juan; Lane, Andrea; Satten, Glen A

Hu, Yi-Juan; Lane, Andrea; Satten, Glen A.

Afiliação

Hu YJ; Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA.
Lane A; Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA.
Satten GA; Department of Gynecology and Obstetrics, Emory University School of Medicine, Atlanta, GA 30322, USA.

Bioinformatics ; 37(12): 1652-1657, 2021 Jul 19.

Article em En | MEDLINE | ID: mdl-33479757

ABSTRACT

ABSTRACT

MOTIVATION Many methods for testing association between the microbiome and covariates of interest (e.g. clinical outcomes, environmental factors) assume that these associations are driven by changes in the relative abundance of taxa. However, these associations may also result from changes in which taxa are present and which are absent. Analyses of such presence-absence associations face a unique challenge confounding by library size (total sample read count), which occurs when library size is associated with covariates in the analysis. It is known that rarefaction (subsampling to a common library size) controls this bias, but at the potential cost of information loss as well as the introduction of a stochastic component into the analysis. Currently, there is a need for robust and efficient methods for testing presence-absence associations in the presence of such confounding, both at the community level and at the individual-taxon level, that avoid the drawbacks of rarefaction.

RESULTS:

We have previously developed the linear decomposition model (LDM) that unifies the community-level and taxon-level tests into one framework. Here, we present an extension of the LDM for testing presence-absence associations. The extended LDM is a non-stochastic approach that repeatedly applies the LDM to all rarefied taxa count tables, averages the residual sum-of-squares (RSS) terms over the rarefaction replicates, and then forms an F-statistic based on these average RSS terms. We show that this approach compares favorably to averaging the F-statistic from R rarefaction replicates, which can only be calculated stochastically. The flexible nature of the LDM allows discrete or continuous traits or interactions to be tested while allowing confounding covariates to be adjusted for. Our simulations indicate that our proposed method is robust to any systematic differences in library size and has better power than alternative approaches. We illustrate our method using an analysis of data on inflammatory bowel disease (IBD) in which cases have systematically smaller library sizes than controls. AVAILABILITYAND IMPLEMENTATION The R package LDM is available on GitHub at https//github.com/yijuanhu/LDM in formats appropriate for Macintosh or Windows. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2021 Tipo de documento: Article