Your browser doesn't support javascript.
loading
Evaluation of haplotype callers for next-generation sequencing of viruses.
Eliseev, Anton; Gibson, Keylie M; Avdeyev, Pavel; Novik, Dmitry; Bendall, Matthew L; Pérez-Losada, Marcos; Alexeev, Nikita; Crandall, Keith A.
Afiliação
  • Eliseev A; Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia.
  • Gibson KM; Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA. Electronic address: kmgibson@gwu.edu.
  • Avdeyev P; Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Mathematics, George Washington University, Washington, DC, USA.
  • Novik D; Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia.
  • Bendall ML; Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA.
  • Pérez-Losada M; Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; CIBIO-InBIO, Centro de Investigaç
  • Alexeev N; Computer Technologies Laboratory, ITMO University, Saint-Petersburg, Russia.
  • Crandall KA; Computational Biology Institute, Milken Institute School of Public Health, George Washington University, Washington, DC, USA; Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University, Washington, DC, USA.
Infect Genet Evol ; 82: 104277, 2020 08.
Article em En | MEDLINE | ID: mdl-32151775
ABSTRACT
Currently, the standard practice for assembling next-generation sequencing (NGS) reads of viral genomes is to summarize thousands of individual short reads into a single consensus sequence, thus confounding useful intra-host diversity information for molecular phylodynamic inference. It is hypothesized that a few viral strains may dominate the intra-host genetic diversity with a variety of lower frequency strains comprising the rest of the population. Several software tools currently exist to convert NGS sequence variants into haplotypes. Previous benchmarks of viral haplotype reconstruction programs used simulation scenarios that are useful from a mathematical perspective but do not reflect viral evolution and epidemiology. Here, we tested twelve NGS haplotype reconstruction methods using viral populations simulated under realistic evolutionary dynamics. We simulated coalescent-based populations that spanned known levels of viral genetic diversity, including mutation rates, sample size and effective population size, to test the limits of the haplotype reconstruction methods and to ensure coverage of predicted intra-host viral diversity levels (especially HIV-1). All twelve investigated haplotype callers showed variable performance and produced drastically different results that were mainly driven by differences in mutation rate and, to a lesser extent, in effective population size. Most methods were able to accurately reconstruct haplotypes when genetic diversity was low. However, under higher levels of diversity (e.g., those seen intra-host HIV-1 infections), haplotype reconstruction quality was highly variable and, on average, poor. All haplotype reconstruction tools, except QuasiRecomb and ShoRAH, greatly underestimated intra-host diversity and the true number of haplotypes. PredictHaplo outperformed, in regard to highest precision, recall, and lowest UniFrac distance values, the other haplotype reconstruction tools followed by CliqueSNV, which, given more computational time, may have outperformed PredictHaplo. Here, we present an extensive comparison of available viral haplotype reconstruction tools and provide insights for future improvements in haplotype reconstruction tools using both short-read and long-read technologies.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Haplótipos / Genoma Viral / Biologia Computacional / Sequenciamento de Nucleotídeos em Larga Escala Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: Infect Genet Evol Assunto da revista: BIOLOGIA / DOENCAS TRANSMISSIVEIS / GENETICA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Federação Russa

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Haplótipos / Genoma Viral / Biologia Computacional / Sequenciamento de Nucleotídeos em Larga Escala Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: Infect Genet Evol Assunto da revista: BIOLOGIA / DOENCAS TRANSMISSIVEIS / GENETICA Ano de publicação: 2020 Tipo de documento: Article País de afiliação: Federação Russa