Your browser doesn't support javascript.
loading
ViPRA-Haplo: De Novo Reconstruction of Viral Populations Using Paired End Sequencing Data.
Article en En | MEDLINE | ID: mdl-38451771
ABSTRACT
We present ViPRA-Haplo, a de novo strain-specific assembly workflow for reconstructing viral haplotypes in a viral population from paired-end next generation sequencing (NGS) data. The proposed Viral Path Reconstruction Algorithm (ViPRA) generates a subset of paths from a De Bruijn graph of reads using the pairing information of reads. The paths generated by ViPRA are an over-estimation of the true contigs. We propose two refinement methods to obtain an optimal set of contigs representing viral haplotypes. The first method clusters paths reconstructed by ViPRA using VSEARCH Deorowicz et al. 2015 based on sequence similarity, while the second method, MLEHaplo, generates a maximum likelihood estimate of viral populations. We evaluated our pipeline on both simulated and real viral quasispecies data from HIV (and real data from SARS-COV-2). Experimental results show that ViPRA-Haplo, although still an overestimation in the number of true contigs, outperforms the existing tool, PEHaplo, providing up to 9% better genome coverage on HIV real data. In addition, ViPRA-Haplo also retains higher diversity of the viral population as demonstrated by the presence of a higher percentage of contigs less than 1000 base pairs (bps), which also contain k-mers with counts less than 100 (representing rarer sequences), which are absent in PEHaplo. For SARS-CoV-2 sequencing data, ViPRA-Haplo reconstructs contigs that cover more than 90% of the reference genome and were able to validate known SARS-CoV-2 strains in the sequencing data.
Asunto(s)

Texto completo: 1 Base de datos: MEDLINE Asunto principal: Algoritmos / Genoma Viral / Secuenciación de Nucleótidos de Alto Rendimiento / SARS-CoV-2 Límite: Humans Idioma: En Revista: ACM Trans Comput Biol Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2024 Tipo del documento: Article

Texto completo: 1 Base de datos: MEDLINE Asunto principal: Algoritmos / Genoma Viral / Secuenciación de Nucleótidos de Alto Rendimiento / SARS-CoV-2 Límite: Humans Idioma: En Revista: ACM Trans Comput Biol Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2024 Tipo del documento: Article