Your browser doesn't support javascript.
loading
Evaluating assembly and variant calling software for strain-resolved analysis of large DNA viruses.
Deng, Zhi-Luo; Dhingra, Akshay; Fritz, Adrian; Götting, Jasper; Münch, Philipp C; Steinbrück, Lars; Schulz, Thomas F; Ganzenmüller, Tina; McHardy, Alice C.
Afiliación
  • Deng ZL; Department Computational Biology of Infection Research of the Helmholtz Centre for Infection Research.
  • Dhingra A; Institute of Virology in Hannover Medical School.
  • Fritz A; Department Computational Biology of Infection Research of the Helmholtz Centre for Infection Research.
  • Götting J; Institute of Virology in Hannover Medical School.
  • Münch PC; Department Computational Biology of Infection Research of the Helmholtz Centre for Infection Research and Max von Pettenkofer Institute in Ludwig Maximilian University of Munich.
  • Steinbrück L; Institute of Virology in Hannover Medical School.
  • Schulz TF; Institute of Virology in Hannover Medical School.
  • Ganzenmüller T; Institute of Virology in Hannover Medical School.
  • McHardy AC; Department Computational Biology of Infection Research of the Helmholtz Centre for Infection Research.
Brief Bioinform ; 22(3)2021 05 20.
Article en En | MEDLINE | ID: mdl-34020538
ABSTRACT
Infection with human cytomegalovirus (HCMV) can cause severe complications in immunocompromised individuals and congenitally infected children. Characterizing heterogeneous viral populations and their evolution by high-throughput sequencing of clinical specimens requires the accurate assembly of individual strains or sequence variants and suitable variant calling methods. However, the performance of most methods has not been assessed for populations composed of low divergent viral strains with large genomes, such as HCMV. In an extensive benchmarking study, we evaluated 15 assemblers and 6 variant callers on 10 lab-generated benchmark data sets created with two different library preparation protocols, to identify best practices and challenges for analyzing such data. Most assemblers, especially metaSPAdes and IVA, performed well across a range of metrics in recovering abundant strains. However, only one, Savage, recovered low abundant strains and in a highly fragmented manner. Two variant callers, LoFreq and VarScan2, excelled across all strain abundances. Both shared a large fraction of false positive variant calls, which were strongly enriched in T to G changes in a 'G.G' context. The magnitude of this context-dependent systematic error is linked to the experimental protocol. We provide all benchmarking data, results and the entire benchmarking workflow named QuasiModo, Quasispecies Metric determination on omics, under the GNU General Public License v3.0 (https//github.com/hzi-bifo/Quasimodo), to enable full reproducibility and further benchmarking on these and other data.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Variación Genética / Programas Informáticos / Genoma Viral / Citomegalovirus Tipo de estudio: Evaluation_studies / Guideline Límite: Humans Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2021 Tipo del documento: Article

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Variación Genética / Programas Informáticos / Genoma Viral / Citomegalovirus Tipo de estudio: Evaluation_studies / Guideline Límite: Humans Idioma: En Revista: Brief Bioinform Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2021 Tipo del documento: Article