Expanding standards in viromics: in silico evaluation of dsDNA viral genome identification, classification, and auxiliary metabolic gene curation.

Pratama, Akbar Adjie; Bolduc, Benjamin; Zayed, Ahmed A; Zhong, Zhi-Ping; Guo, Jiarong; Vik, Dean R; Gazitúa, Maria Consuelo; Wainaina, James M; Roux, Simon; Sullivan, Matthew B

Pratama, Akbar Adjie; Bolduc, Benjamin; Zayed, Ahmed A; Zhong, Zhi-Ping; Guo, Jiarong; Vik, Dean R; Gazitúa, Maria Consuelo; Wainaina, James M; Roux, Simon; Sullivan, Matthew B.

Afiliación

Pratama AA; Department of Microbiology, Ohio State University, Columbus, OH, United States of America.
Bolduc B; Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America.
Zayed AA; Department of Microbiology, Ohio State University, Columbus, OH, United States of America.
Zhong ZP; Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America.
Guo J; Department of Microbiology, Ohio State University, Columbus, OH, United States of America.
Vik DR; Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America.
Gazitúa MC; Department of Microbiology, Ohio State University, Columbus, OH, United States of America.
Wainaina JM; Center of Microbiome Science, Ohio State University, Columbus, OH, United States of America.
Roux S; Byrd Polar and Climate Research Center, Ohio State University, Columbus, OH, United States of America.
Sullivan MB; Department of Microbiology, Ohio State University, Columbus, OH, United States of America.

PeerJ ; 9: e11447, 2021.

Article en En | MEDLINE | ID: mdl-34178438

RESUMEN

BACKGROUND: Viruses influence global patterns of microbial diversity and nutrient cycles. Though viral metagenomics (viromics), specifically targeting dsDNA viruses, has been critical for revealing viral roles across diverse ecosystems, its analyses differ in many ways from those used for microbes. To date, viromics benchmarking has covered read pre-processing, assembly, relative abundance, read mapping thresholds and diversity estimation, but other steps would benefit from benchmarking and standardization. Here we use in silico-generated datasets and an extensive literature survey to evaluate and highlight how dataset composition (i.e., viromes vs bulk metagenomes) and assembly fragmentation impact (i) viral contig identification tool, (ii) virus taxonomic classification, and (iii) identification and curation of auxiliary metabolic genes (AMGs). RESULTS: The in silico benchmarking of five commonly used virus identification tools show that gene-content-based tools consistently performed well for long (≥3 kbp) contigs, while k-mer- and blast-based tools were uniquely able to detect viruses from short (≤3 kbp) contigs. Notably, however, the performance increase of k-mer- and blast-based tools for short contigs was obtained at the cost of increased false positives (sometimes up to â¼5% for virome and â¼75% bulk samples), particularly when eukaryotic or mobile genetic element sequences were included in the test datasets. For viral classification, variously sized genome fragments were assessed using gene-sharing network analytics to quantify drop-offs in taxonomic assignments, which revealed correct assignations ranging from â¼95% (whole genomes) down to â¼80% (3 kbp sized genome fragments). A similar trend was also observed for other viral classification tools such as VPF-class, ViPTree and VIRIDIC, suggesting that caution is warranted when classifying short genome fragments and not full genomes. Finally, we highlight how fragmented assemblies can lead to erroneous identification of AMGs and outline a best-practices workflow to curate candidate AMGs in viral genomes assembled from metagenomes. CONCLUSION: Together, these benchmarking experiments and annotation guidelines should aid researchers seeking to best detect, classify, and characterize the myriad viruses 'hidden' in diverse sequence datasets.

Palabras clave

Benchmarks; Ecology; Standard operating procedure; Viromics; Viruses

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Banco de datos: MEDLINE Tipo de estudio: Diagnostic_studies / Guideline / Qualitative_research Idioma: En Revista: PeerJ Año: 2021 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google