Benchmarking of computational error-correction methods for next-generation sequencing data.

Mitchell, Keith; Brito, Jaqueline J; Mandric, Igor; Wu, Qiaozhen; Knyazev, Sergey; Chang, Sei; Martin, Lana S; Karlsberg, Aaron; Gerasimov, Ekaterina; Littman, Russell; Hill, Brian L; Wu, Nicholas C; Yang, Harry Taegyun; Hsieh, Kevin; Chen, Linus; Littman, Eli; Shabani, Taylor; Enik, German; Yao, Douglas; Sun, Ren; Schroeder, Jan; Eskin, Eleazar; Zelikovsky, Alex; Skums, Pavel; Pop, Mihai; Mangul, Serghei

Mitchell, Keith; Brito, Jaqueline J; Mandric, Igor; Wu, Qiaozhen; Knyazev, Sergey; Chang, Sei; Martin, Lana S; Karlsberg, Aaron; Gerasimov, Ekaterina; Littman, Russell; Hill, Brian L; Wu, Nicholas C; Yang, Harry Taegyun; Hsieh, Kevin; Chen, Linus; Littman, Eli; Shabani, Taylor; Enik, German; Yao, Douglas; Sun, Ren; Schroeder, Jan; Eskin, Eleazar; Zelikovsky, Alex; Skums, Pavel; Pop, Mihai; Mangul, Serghei.

Afiliación

Mitchell K; Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA.
Brito JJ; Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA.
Mandric I; Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA.
Wu Q; Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA.
Knyazev S; Department of Mathematics, University of California Los Angeles, 520 Portola Plaza, Los Angeles, CA, 90095, USA.
Chang S; Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA.
Martin LS; Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA.
Karlsberg A; Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA.
Gerasimov E; Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, 1985 Zonal Avenue, Los Angeles, CA, 90089, USA.
Littman R; Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA.
Hill BL; UCLA Bioinformatics, 621 Charles E Young Dr S, Los Angeles, CA, 90024, USA.
Wu NC; Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA.
Yang HT; Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA.
Hsieh K; Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA.
Chen L; Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA.
Littman E; Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA.
Shabani T; Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA.
Enik G; Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA.
Yao D; Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA.
Sun R; Department of Molecular, Cell, and Developmental Biology, University of California Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA.
Schroeder J; Department of Molecular and Medical Pharmacology, University of California Los Angeles, 650 Charles E. Young Drive South, Los Angeles, CA, 90095, USA.
Eskin E; Epigenetics & Reprogramming Laboratory, Monash University, 15 Innovation Walk, Melbourne, VIC, 3800, Australia.
Zelikovsky A; Department of Computer Science, University of California Los Angeles, 404 Westwood Plaza, Los Angeles, CA, 90095, USA.
Skums P; Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA.
Pop M; The Laboratory of Bioinformatics, I.M, Sechenov First Moscow State Medical University, Moscow, Russia, 119991.
Mangul S; Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA.

Genome Biol ; 21(1): 71, 2020 03 17.

Article en En | MEDLINE | ID: mdl-32183840

RESUMEN

BACKGROUND: Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. RESULTS: In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. CONCLUSIONS: In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.

Asunto(s)

Algoritmos; Secuenciación de Nucleótidos de Alto Rendimiento; Benchmarking; Biología Computacional/métodos; Humanos; Receptores de Antígenos de Linfocitos T/genética; Virus/genética; Secuenciación Completa del Genoma

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Algoritmos / Secuenciación de Nucleótidos de Alto Rendimiento Tipo de estudio: Evaluation_studies / Guideline Límite: Humans Idioma: En Revista: Genome Biol Asunto de la revista: BIOLOGIA MOLECULAR / GENETICA Año: 2020 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google