Your browser doesn't support javascript.
loading
YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample.
Koslicki, David; White, Stephen; Ma, Chunyu; Novikov, Alexei.
Affiliation
  • Koslicki D; Department of Computer Science and Engineering, Pennsylvania State University, State College, PA 16802, United States.
  • White S; Department of Biology, Pennsylvania State University, State College, PA 16802, United States.
  • Ma C; Huck Institutes of the Life Sciences, Pennsylvania State University, State College, PA 16802, USA.
  • Novikov A; One Health Microbiome Center, Pennsylvania State University, State College, PA 16802, United States.
Bioinformatics ; 40(2)2024 02 01.
Article in En | MEDLINE | ID: mdl-38268451
ABSTRACT
MOTIVATION In metagenomics, the study of environmentally associated microbial communities from their sampled DNA, one of the most fundamental computational tasks is that of determining which genomes from a reference database are present or absent in a given sample metagenome. Existing tools generally return point estimates, with no associated confidence or uncertainty associated with it. This has led to practitioners experiencing difficulty when interpreting the results from these tools, particularly for low-abundance organisms as these often reside in the "noisy tail" of incorrect predictions. Furthermore, few tools account for the fact that reference databases are often incomplete and rarely, if ever, contain exact replicas of genomes present in an environmentally derived metagenome.

RESULTS:

We present solutions for these issues by introducing the algorithm YACHT Yes/No Answers to Community membership via Hypothesis Testing. This approach introduces a statistical framework that accounts for sequence divergence between the reference and sample genomes, in terms of ANI, as well as incomplete sequencing depth, thus providing a hypothesis test for determining the presence or absence of a reference genome in a sample. After introducing our approach, we quantify its statistical power and how this changes with varying parameters. Subsequently, we perform extensive experiments using both simulated and real data to confirm the accuracy and scalability of this approach. AVAILABILITY AND IMPLEMENTATION The source code implementing this approach is available via Conda and at https//github.com/KoslickiLab/YACHT. We also provide the code for reproducing experiments at https//github.com/KoslickiLab/YACHT-reproducibles.
Subject(s)

Full text: 1 Database: MEDLINE Main subject: Metagenome / Microbiota Language: En Year: 2024 Type: Article

Full text: 1 Database: MEDLINE Main subject: Metagenome / Microbiota Language: En Year: 2024 Type: Article