Genome Biol ; 18(1): 182, 2017 09 21.
Artigo em Inglês | MEDLINE | ID: mdl-28934964


BACKGROUND: One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. RESULTS: In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. CONCLUSIONS: This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

Benchmarking/métodos , Mapeamento de Sequências Contíguas/métodos , Código de Barras de DNA Taxonômico/métodos , Metagenoma , Análise de Sequência de DNA/métodos , Software , Benchmarking/normas , Mapeamento de Sequências Contíguas/normas , Código de Barras de DNA Taxonômico/normas , Humanos , Microbiota , Filogenia , Análise de Sequência de DNA/normas
J Biomol Tech ; 28(1): 46-55, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28344519


Amplification of minute quantities of DNA is a fundamental challenge in low-biomass metagenomic and microbiome studies because of potential biases in coverage, guanine-cytosine (GC) content, and altered species abundances. Whole genome amplification (WGA), although widely used, is notorious for introducing artifact sequences, either by amplifying laboratory contaminants or by nonrandom amplification of a sample's DNA. In this study, we investigate the effect of REPLI-g multiple displacement amplification (MDA; Qiagen, Valencia, CA, USA) on sequencing data quality and species abundance detection in 8 paired metagenomic samples and 1 titrated, mixed control sample. We extracted and sequenced genomic DNA (gDNA) from 8 environmental samples and compared the quality of the sequencing data for the MDA and their corresponding non-MDA samples. The degree of REPLI-g MDA bias was evaluated by sequence metrics, species composition, and cross-validating observed species abundance and species diversity estimates using the One Codex and MetaPhlAn taxonomic classification tools. Here, we provide evidence of the overall efficacy of REPLI-g MDA on retaining sequencing data quality and species abundance measurements while providing increased yields of high-fidelity DNA. We find that species abundance estimates are largely consistent across samples, even with REPLI-g amplification, as demonstrated by the Spearman's rank order coefficient (R2 > 0.8). However, REPLI-g MDA often produced fewer classified reads at the species, genera, and family level, resulting in decreased species diversity. We also observed some areas with the PCR "jackpot effect," with varying input DNA values for the Metagenomics Research Group (MGRG) controls at specific genomic loci. We visualize this effect in whole genome coverage plots and with sequence composition analyses and note these caveats of the MDA method. Despite overall concordance of species abundance between the amplified and unamplified samples, these results demonstrate that amplification of DNA using the REPLI-g method has some limitations. These concerns could be addressed by future improvements in the enzymes or methods for REPLI-g to be considered a >99% robust method for increasing the amount of high-fidelity DNA from low-biomass samples or at the very least, accounted for during computational analysis of MDA samples.

Microbiologia Ambiental , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Composição de Bases , DNA Bacteriano/genética , DNA Bacteriano/isolamento & purificação , Genoma Bacteriano , Metagenômica , Microbiota/genética
J Biomol Tech ; 28(1): 31-39, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28337070


The Extreme Microbiome Project (XMP) is a project launched by the Association of Biomolecular Resource Facilities Metagenomics Research Group (ABRF MGRG) that focuses on whole genome shotgun sequencing of extreme and unique environments using a wide variety of biomolecular techniques. The goals are multifaceted, including development and refinement of new techniques for the following: 1) the detection and characterization of novel microbes, 2) the evaluation of nucleic acid techniques for extremophilic samples, and 3) the identification and implementation of the appropriate bioinformatics pipelines. Here, we highlight the different ongoing projects that we have been working on, as well as details on the various methods we use to characterize the microbiome and metagenome of these complex samples. In particular, we present data of a novel multienzyme extraction protocol that we developed, called Polyzyme or MetaPolyZyme. Presently, the XMP is characterizing sample sites around the world with the intent of discovering new species, genes, and gene clusters. Once a project site is complete, the resulting data will be publically available. Sites include Lake Hillier in Western Australia, the "Door to Hell" crater in Turkmenistan, deep ocean brine lakes of the Gulf of Mexico, deep ocean sediments from Greenland, permafrost tunnels in Alaska, ancient microbial biofilms from Antarctica, Blue Lagoon Iceland, Ethiopian toxic hot springs, and the acidic hypersaline ponds in Western Australia.

Microbiologia Ambiental , Microbiota/genética , DNA Bacteriano/genética , DNA Bacteriano/isolamento & purificação , Ambientes Extremos , Metagenoma , Tipagem Molecular/normas , RNA Bacteriano/genética , RNA Bacteriano/isolamento & purificação , Padrões de Referência , Análise de Sequência de DNA/normas
Diagn Microbiol Infect Dis ; 87(1): 11-16, 2017 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-27771207


Understanding the contribution of relapse and reinfection to recurrent Clostridium difficile infection (CDI) has implications for therapy and infection prevention, respectively. We used whole genome sequencing to determine the relation of C. difficile strains isolated from patients with recurrent CDI at an academic medical center in the United States. Thirty-five toxigenic C. difficile isolates from 16 patients with 19 recurrent CDI episodes with median time of 53.5days (range, 13-362) between episodes were whole genome sequenced on the Illumina MiSeq platform. In 84% (16) of recurrences, the cause of recurrence was relapse with prior strain of C. difficile. In 16% (3) of recurrent episodes, reinfection with a new strain of C. difficile was the cause. In conclusion, the majority of CDI recurrences at our institution were due to infection with the same strain rather than infection with a new strain.

Infecções por Clostridium/epidemiologia , Infecções por Clostridium/microbiologia , Clostridium difficile/classificação , Clostridium difficile/genética , Genoma Bacteriano , Genótipo , Análise de Sequência de DNA , Centros Médicos Acadêmicos , Adulto , Idoso , Idoso de 80 Anos ou mais , Clostridium difficile/isolamento & purificação , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Pessoa de Meia-Idade , Recidiva , Estados Unidos/epidemiologia