VeChat: correcting errors in long reads using variation graphs.
Nat Commun
; 13(1): 6657, 2022 11 04.
Article
em En
| MEDLINE
| ID: mdl-36333324
ABSTRACT
Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https//github.com/HaploKit/vechat .
Texto completo:
1
Coleções:
01-internacional
Base de dados:
MEDLINE
Assunto principal:
Algoritmos
/
Nanoporos
Idioma:
En
Revista:
Nat Commun
Assunto da revista:
BIOLOGIA
/
CIENCIA
Ano de publicação:
2022
Tipo de documento:
Article
País de afiliação:
Alemanha