Your browser doesn't support javascript.
loading
Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline.
Wattanasombat, Sara; Tongjai, Siripong.
Affiliation
  • Wattanasombat S; Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand.
  • Tongjai S; Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand.
F1000Res ; 13: 556, 2024.
Article in En | MEDLINE | ID: mdl-38984017
ABSTRACT

Background:

Determining the appropriate computational requirements and software performance is essential for efficient genomic surveillance. The lack of standardized benchmarking complicates software selection, especially with limited resources.

Methods:

We developed a containerized benchmarking pipeline to evaluate seven long-read assemblers-Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, and RVHaplo-for viral haplotype reconstruction, using both simulated and experimental Oxford Nanopore sequencing data of HIV-1 and other viruses. Benchmarking was conducted on three computational systems to assess each assembler's performance, utilizing QUAST and BLASTN for quality assessment.

Results:

Our findings show that assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection also influences the size of the contigs, with a minimum read length of 2,000 nucleotides required for quality assembly. A 4,000-nucleotide read length improves quality further. Canu was efficient among de novo assemblers but not suitable for multi-strain mixtures, while GoldRush produced only consensus assemblies. Strainline and MetaFlye were suitable for metagenomic sequencing data, with Strainline requiring high memory and MetaFlye operable on low-specification machines. Among reference-based assemblers, iGDA had high error rates, RVHaplo showed the best runtime and accuracy but became ineffective with similar sequences, and HaploDMF, utilizing machine learning, had fewer errors with a slightly longer runtime.

Conclusions:

The HIV-64148 pipeline, containerized using Docker, facilitates easy deployment and offers flexibility to select from a range of assemblers to match computational systems or study requirements. This tool aids in genome assembly and provides valuable information on HIV-1 sequences, enhancing viral evolution monitoring and understanding.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Software / HIV-1 / Computational Biology / Genomics Limits: Humans Language: En Journal: F1000Res Year: 2024 Document type: Article Affiliation country:

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Software / HIV-1 / Computational Biology / Genomics Limits: Humans Language: En Journal: F1000Res Year: 2024 Document type: Article Affiliation country: