Your browser doesn't support javascript.
loading
Comprehensive benchmarking and ensemble approaches for metagenomic classifiers.
McIntyre, Alexa B R; Ounit, Rachid; Afshinnekoo, Ebrahim; Prill, Robert J; Hénaff, Elizabeth; Alexander, Noah; Minot, Samuel S; Danko, David; Foox, Jonathan; Ahsanuddin, Sofia; Tighe, Scott; Hasan, Nur A; Subramanian, Poorani; Moffat, Kelly; Levy, Shawn; Lonardi, Stefano; Greenfield, Nick; Colwell, Rita R; Rosen, Gail L; Mason, Christopher E.
Afiliación
  • McIntyre ABR; Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA.
  • Ounit R; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10021, USA.
  • Afshinnekoo E; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, 10021, USA.
  • Prill RJ; Department of Computer Science and Engineering, University of California, Riverside, CA, 92521, USA.
  • Hénaff E; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10021, USA.
  • Alexander N; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, 10021, USA.
  • Minot SS; School of Medicine, New York Medical College, Valhalla, NY, 10595, USA.
  • Danko D; Accelerated Discovery Lab, IBM Almaden Research Center, San Jose, CA, 95120, USA.
  • Foox J; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10021, USA.
  • Ahsanuddin S; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, 10021, USA.
  • Tighe S; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10021, USA.
  • Hasan NA; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, 10021, USA.
  • Subramanian P; One Codex, Reference Genomics, San Francisco, CA, 94103, USA.
  • Moffat K; Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA.
  • Levy S; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10021, USA.
  • Lonardi S; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, 10021, USA.
  • Greenfield N; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10021, USA.
  • Colwell RR; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, 10021, USA.
  • Rosen GL; Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10021, USA.
  • Mason CE; The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, 10021, USA.
Genome Biol ; 18(1): 182, 2017 09 21.
Article en En | MEDLINE | ID: mdl-28934964
ABSTRACT

BACKGROUND:

One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited.

RESULTS:

In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages.

CONCLUSIONS:

This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.
Asunto(s)
Palabras clave

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Análisis de Secuencia de ADN / Benchmarking / Mapeo Contig / Metagenoma / Código de Barras del ADN Taxonómico Tipo de estudio: Guideline / Prognostic_studies Límite: Humans Idioma: En Año: 2017 Tipo del documento: Article

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Análisis de Secuencia de ADN / Benchmarking / Mapeo Contig / Metagenoma / Código de Barras del ADN Taxonómico Tipo de estudio: Guideline / Prognostic_studies Límite: Humans Idioma: En Año: 2017 Tipo del documento: Article