Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline.

Wattanasombat, Sara; Tongjai, Siripong

Wattanasombat, Sara; Tongjai, Siripong.

Afiliación

Wattanasombat S; Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand.
Tongjai S; Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand.

F1000Res ; 13: 556, 2024.

Article en En | MEDLINE | ID: mdl-38984017

ABSTRACT

ABSTRACT

Background:

Determining the appropriate computational requirements and software performance is essential for efficient genomic surveillance. The lack of standardized benchmarking complicates software selection, especially with limited resources.

Methods:

We developed a containerized benchmarking pipeline to evaluate seven long-read assemblers-Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, and RVHaplo-for viral haplotype reconstruction, using both simulated and experimental Oxford Nanopore sequencing data of HIV-1 and other viruses. Benchmarking was conducted on three computational systems to assess each assembler's performance, utilizing QUAST and BLASTN for quality assessment.

Results:

Our findings show that assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection also influences the size of the contigs, with a minimum read length of 2,000 nucleotides required for quality assembly. A 4,000-nucleotide read length improves quality further. Canu was efficient among de novo assemblers but not suitable for multi-strain mixtures, while GoldRush produced only consensus assemblies. Strainline and MetaFlye were suitable for metagenomic sequencing data, with Strainline requiring high memory and MetaFlye operable on low-specification machines. Among reference-based assemblers, iGDA had high error rates, RVHaplo showed the best runtime and accuracy but became ineffective with similar sequences, and HaploDMF, utilizing machine learning, had fewer errors with a slightly longer runtime.

Conclusions:

The HIV-64148 pipeline, containerized using Docker, facilitates easy deployment and offers flexibility to select from a range of assemblers to match computational systems or study requirements. This tool aids in genome assembly and provides valuable information on HIV-1 sequences, enhancing viral evolution monitoring and understanding.

Asunto(s)

Biología Computacional; Genómica; VIH-1; Programas Informáticos; VIH-1/genética; Biología Computacional/métodos; Genómica/métodos; Humanos; Genoma Viral/genética

Palabras clave

Genome assembly; Genomic surveillance; HIV; Haplotype reconstruction; Infectious Diseases; NGS; Single-molecule sequencing; Virus

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Programas Informáticos / VIH-1 / Biología Computacional / Genómica Límite: Humans Idioma: En Revista: F1000Res / F1000Research Año: 2024 Tipo del documento: Article País de afiliación: Tailandia Pais de publicación: Reino Unido

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google