Processing-bias correction with DEBIAS-M improves cross-study generalization of microbiome-based prediction models.

Austin, George I; Kav, Aya Brown; Park, Heekuk; Biermann, Jana; Uhlemann, Anne-Catrin; Korem, Tal

Austin, George I; Kav, Aya Brown; Park, Heekuk; Biermann, Jana; Uhlemann, Anne-Catrin; Korem, Tal.

Afiliação

Austin GI; Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.
Kav AB; Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.
Park H; Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.
Biermann J; Division of Infectious Diseases, Columbia University Irving Medical Center, New York, NY, USA.
Uhlemann AC; Program for Mathematical Genomics, Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA.
Korem T; Department of Medicine, Division of Hematology/Oncology, Columbia University Irving Medical Center, New York, NY, USA.

bioRxiv ; 2024 Feb 12.

Article em En | MEDLINE | ID: mdl-38405914

ABSTRACT

ABSTRACT

Every step in common microbiome profiling protocols has variable efficiency for each microbe. For example, different DNA extraction kits may have different efficiency for Gram-positive and -negative bacteria. These variable efficiencies, combined with technical variation, create strong processing biases, which impede the identification of signals that are reproducible across studies and the development of generalizable and biologically interpretable prediction models. "Batch-correction" methods have been used to alleviate these issues computationally with some success. However, many make strong parametric assumptions which do not necessarily apply to microbiome data or processing biases, or require the use of an outcome variable, which risks overfitting. Lastly and importantly, existing transformations used to correct microbiome data are largely non-interpretable, and could, for example, introduce values to features that were initially mostly zeros. Altogether, processing bias currently compromises our ability to glean robust and generalizable biological insights from microbiome data. Here, we present DEBIAS-M (Domain adaptation with phenotype Estimation and Batch Integration Across Studies of the Microbiome), an interpretable framework for inference and correction of processing bias, which facilitates domain adaptation in microbiome studies. DEBIAS-M learns bias-correction factors for each microbe in each batch that simultaneously minimize batch effects and maximize cross-study associations with phenotypes. Using benchmarks of HIV and colorectal cancer classification from gut microbiome data, and cervical neoplasia prediction from cervical microbiome data, we demonstrate that DEBIAS-M outperforms batch-correction methods commonly used in the field. Notably, we show that the inferred bias-correction factors are stable, interpretable, and strongly associated with specific experimental protocols. Overall, we show that DEBIAS-M allows for better modeling of microbiome data and identification of interpretable signals that are reproducible across studies.

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Idioma: En Revista: BioRxiv Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google