CBEA: Competitive balances for taxonomic enrichment analysis.

Nguyen, Quang P; Hoen, Anne G; Frost, H Robert

Nguyen, Quang P; Hoen, Anne G; Frost, H Robert.

Afiliação

Nguyen QP; Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, United States of America.
Hoen AG; Department of Epidemiology, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, United States of America.
Frost HR; Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, United States of America.

PLoS Comput Biol ; 18(5): e1010091, 2022 05.

Article em En | MEDLINE | ID: mdl-35584140

ABSTRACT

ABSTRACT

Research in human-associated microbiomes often involves the analysis of taxonomic count tables generated via high-throughput sequencing. It is difficult to apply statistical tools as the data is high-dimensional, sparse, and compositional. An approachable way to alleviate high-dimensionality and sparsity is to aggregate variables into pre-defined sets. Set-based analysis is ubiquitous in the genomics literature and has demonstrable impact on improving interpretability and power of downstream analysis. Unfortunately, there is a lack of sophisticated set-based analysis methods specific to microbiome taxonomic data, where current practice often employs abundance summation as a technique for aggregation. This approach prevents comparison across sets of different sizes, does not preserve inter-sample distances, and amplifies protocol bias. Here, we attempt to fill this gap with a new single-sample taxon enrichment method that uses a novel log-ratio formulation based on the competitive null hypothesis commonly used in the enrichment analysis literature. Our approach, titled competitive balances for taxonomic enrichment analysis (CBEA), generates sample-specific enrichment scores as the scaled log-ratio of the subcomposition defined by taxa within a set and the subcomposition defined by its complement. We provide sample-level significance testing by estimating an empirical null distribution of our test statistic with valid p-values. Herein, we demonstrate, using both real data applications and simulations, that CBEA controls for type I error, even under high sparsity and high inter-taxa correlation scenarios. Additionally, CBEA provides informative scores that can be inputs to downstream analyses such as prediction tasks.

Assuntos

Microbiota; Genômica/métodos; Sequenciamento de Nucleotídeos em Larga Escala; Humanos; Microbiota/genética

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Microbiota Tipo de estudo: Guideline / Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google