Rank normalization empowers a t-test for microbiome differential abundance analysis while controlling for false discoveries.

Davis, Matthew L; Huang, Yuan; Wang, Kai

Davis, Matthew L; Huang, Yuan; Wang, Kai.

Afiliação

Davis ML; Department of Biostatistics, University of Iowa College of Public Health, 145 N Riverside Dr, 52242, IA, USA.
Huang Y; Department of Biostatistics, Yale School of Public Health, 60 College St, 06510, CT, USA.
Wang K; Department of Biostatistics, University of Iowa College of Public Health, 145 N Riverside Dr, 52242, IA, USA.

Brief Bioinform ; 22(5)2021 09 02.

Article em En | MEDLINE | ID: mdl-33822893

RESUMO

A major task in the analysis of microbiome data is to identify microbes associated with differing biological conditions. Before conducting analysis, raw data must first be adjusted so that counts from different samples are comparable. A typical approach is to estimate normalization factors by which all counts in a sample are multiplied or divided. However, the inherent variation associated with estimation of normalization factors are often not accounted for in subsequent analysis, leading to a loss of precision. Rank normalization is a nonparametric alternative to the estimation of normalization factors in which each count for a microbial feature is replaced by its intrasample rank. Although rank normalization has been successfully applied to microarray analysis in the past, it has yet to be explored for microbiome data, which is characterized by high frequencies of 0s, strongly correlated features and compositionality. We propose to use rank normalization as an alternative to the estimation of normalization factors and examine its performance when paired with a two-sample t-test. On a rigorous 3rd-party benchmarking simulation, it is shown to offer strong control over the false discovery rate, and at sample sizes greater than 50 per treatment group, to offer an improvement in performance over commonly used normalization factors paired with t-tests, Wilcoxon rank-sum tests and methodologies implemented by R packages. On two real datasets, it yielded valid and reproducible results that were strongly in agreement with the original findings and the existing literature, further demonstrating its robustness and future potential. Availability: The data underlying this article are available online along with R code and supplementary materials at https://github.com/matthewlouisdavisBioStat/Rank-Normalization-Empowers-a-T-Test.

Assuntos

Bactérias/genética; Infecções Bacterianas/diagnóstico; Bioestatística/métodos; Neoplasias Colorretais/microbiologia; Doença de Crohn/microbiologia; Microbioma Gastrointestinal/genética; Metagenoma; Infecções Bacterianas/microbiologia; Benchmarking; Estudos de Casos e Controles; Criança; Estudos de Coortes; Simulação por Computador; Feminino; Humanos; Masculino; Computação Matemática; Metagenômica/métodos; RNA Ribossômico 16S/genética; Reprodutibilidade dos Testes; Sensibilidade e Especificidade; Estatísticas não Paramétricas

Palavras-chave

differential abundance analysis; false discovery rate; microbiome; rank normalization; t-test

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Bactérias / Infecções Bacterianas / Neoplasias Colorretais / Doença de Crohn / Bioestatística / Metagenoma / Microbioma Gastrointestinal Tipo de estudo: Diagnostic_studies / Etiology_studies / Evaluation_studies / Incidence_studies / Observational_studies / Prognostic_studies / Risk_factors_studies Limite: Child / Female / Humans / Male Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google