Búsqueda | Portal de Búsqueda de la BVS

RESCRIPt: Reproducible sequence taxonomy reference database management.

Robeson, Michael S; O'Rourke, Devon R; Kaehler, Benjamin D; Ziemski, Michal; Dillon, Matthew R; Foster, Jeffrey T; Bokulich, Nicholas A.

PLoS Comput Biol ; 17(11): e1009581, 2021 11.

Artículo en Inglés | MEDLINE | ID: mdl-34748542

RESUMEN

Nucleotide sequence and taxonomy reference databases are critical resources for widespread applications including marker-gene and metagenome sequencing for microbiome analysis, diet metabarcoding, and environmental DNA (eDNA) surveys. Reproducibly generating, managing, using, and evaluating nucleotide sequence and taxonomy reference databases creates a significant bottleneck for researchers aiming to generate custom sequence databases. Furthermore, database composition drastically influences results, and lack of standardization limits cross-study comparisons. To address these challenges, we developed RESCRIPt, a Python 3 software package and QIIME 2 plugin for reproducible generation and management of reference sequence taxonomy databases, including dedicated functions that streamline creating databases from popular sources, and functions for evaluating, comparing, and interactively exploring qualitative and quantitative characteristics across reference databases. To highlight the breadth and capabilities of RESCRIPt, we provide several examples for working with popular databases for microbiome profiling (SILVA, Greengenes, NCBI-RefSeq, GTDB), eDNA and diet metabarcoding surveys (BOLD, GenBank), as well as for genome comparison. We show that bigger is not always better, and reference databases with standardized taxonomies and those that focus on type strains have quantitative advantages, though may not be appropriate for all use cases. Most databases appear to benefit from some curation (quality filtering), though sequence clustering appears detrimental to database quality. Finally, we demonstrate the breadth and extensibility of RESCRIPt for reproducible workflows with a comparison of global hepatitis genomes. RESCRIPt provides tools to democratize the process of reference database acquisition and management, enabling researchers to reproducibly and transparently create reference materials for diverse research applications. RESCRIPt is released under a permissive BSD-3 license at https://github.com/bokulich-lab/RESCRIPt.

Asunto(s)

Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas/estadística & datos numéricos , Programas Informáticos , Animales , Clasificación , Biología Computacional , Código de Barras del ADN Taxonómico , Bases de Datos de Ácidos Nucleicos , Genómica , Humanos , Metagenoma , Metagenómica , Microbiota/genética , Filogenia , ARN Ribosómico 16S/genética , Análisis de Secuencia

A total crapshoot? Evaluating bioinformatic decisions in animal diet metabarcoding analyses.

O'Rourke, Devon R; Bokulich, Nicholas A; Jusino, Michelle A; MacManes, Matthew D; Foster, Jeffrey T.

Ecol Evol ; 10(18): 9721-9739, 2020 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-33005342

RESUMEN

Metabarcoding studies provide a powerful approach to estimate the diversity and abundance of organisms in mixed communities in nature. While strategies exist for optimizing sample and sequence library preparation, best practices for bioinformatic processing of amplicon sequence data are lacking in animal diet studies. Here we evaluate how decisions made in core bioinformatic processes, including sequence filtering, database design, and classification, can influence animal metabarcoding results. We show that denoising methods have lower error rates compared to traditional clustering methods, although these differences are largely mitigated by removing low-abundance sequence variants. We also found that available reference datasets from GenBank and BOLD for the animal marker gene cytochrome oxidase I (COI) can be complementary, and we discuss methods to improve existing databases to include versioned releases. Taxonomic classification methods can dramatically affect results. For example, the commonly used Barcode of Life Database (BOLD) Classification API assigned fewer names to samples from order through species levels using both a mock community and bat guano samples compared to all other classifiers (vsearch-SINTAX and q2-feature-classifier's BLAST + LCA, VSEARCH + LCA, and Naive Bayes classifiers). The lack of consensus on bioinformatics best practices limits comparisons among studies and may introduce biases. Our work suggests that biological mock communities offer a useful standard to evaluate the myriad computational decisions impacting animal metabarcoding accuracy. Further, these comparisons highlight the need for continual evaluations as new tools are adopted to ensure that the inferences drawn reflect meaningful biology instead of digital artifacts.

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA