Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline.
F1000Res
; 13: 556, 2024.
Article
en En
| MEDLINE
| ID: mdl-38984017
ABSTRACT
Background:
Determining the appropriate computational requirements and software performance is essential for efficient genomic surveillance. The lack of standardized benchmarking complicates software selection, especially with limited resources.Methods:
We developed a containerized benchmarking pipeline to evaluate seven long-read assemblers-Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, and RVHaplo-for viral haplotype reconstruction, using both simulated and experimental Oxford Nanopore sequencing data of HIV-1 and other viruses. Benchmarking was conducted on three computational systems to assess each assembler's performance, utilizing QUAST and BLASTN for quality assessment.Results:
Our findings show that assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection also influences the size of the contigs, with a minimum read length of 2,000 nucleotides required for quality assembly. A 4,000-nucleotide read length improves quality further. Canu was efficient among de novo assemblers but not suitable for multi-strain mixtures, while GoldRush produced only consensus assemblies. Strainline and MetaFlye were suitable for metagenomic sequencing data, with Strainline requiring high memory and MetaFlye operable on low-specification machines. Among reference-based assemblers, iGDA had high error rates, RVHaplo showed the best runtime and accuracy but became ineffective with similar sequences, and HaploDMF, utilizing machine learning, had fewer errors with a slightly longer runtime.Conclusions:
The HIV-64148 pipeline, containerized using Docker, facilitates easy deployment and offers flexibility to select from a range of assemblers to match computational systems or study requirements. This tool aids in genome assembly and provides valuable information on HIV-1 sequences, enhancing viral evolution monitoring and understanding.Palabras clave
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Asunto principal:
Programas Informáticos
/
VIH-1
/
Biología Computacional
/
Genómica
Límite:
Humans
Idioma:
En
Revista:
F1000Res
/
F1000Research
Año:
2024
Tipo del documento:
Article
País de afiliación:
Tailandia
Pais de publicación:
Reino Unido