ADEPT, a dynamic next generation sequencing data error-detection program with trimming.

Feng, Shihai; Lo, Chien-Chi; Li, Po-E; Chain, Patrick S G

Feng, Shihai; Lo, Chien-Chi; Li, Po-E; Chain, Patrick S G.

Afiliación

Feng S; Genome Science Group, Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA. sfeng@lanl.gov.
Lo CC; Genome Science Group, Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA. chienchi@lanl.gov.
Li PE; Genome Science Group, Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA. po-e@lanl.gov.
Chain PS; Genome Science Group, Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, 87545, USA. pchain@lanl.gov.

BMC Bioinformatics ; 17: 109, 2016 Feb 29.

Article en En | MEDLINE | ID: mdl-26928302

RESUMEN

BACKGROUND: Illumina is the most widely used next generation sequencing technology and produces millions of short reads that contain errors. These sequencing errors constitute a major problem in applications such as de novo genome assembly, metagenomics analysis and single nucleotide polymorphism discovery. RESULTS: In this study, we present ADEPT, a dynamic error detection method, based on the quality scores of each nucleotide and its neighboring nucleotides, together with their positions within the read and compares this to the position-specific quality score distribution of all bases within the sequencing run. This method greatly improves upon other available methods in terms of the true positive rate of error discovery without affecting the false positive rate, particularly within the middle of reads. CONCLUSIONS: ADEPT is the only tool to date that dynamically assesses errors within reads by comparing position-specific and neighboring base quality scores with the distribution of quality scores for the dataset being analyzed. The result is a method that is less prone to position-dependent under-prediction, which is one of the most prominent issues in error prediction. The outcome is that ADEPT improves upon prior efforts in identifying true errors, primarily within the middle of reads, while reducing the false positive rate.

Asunto(s)

Bacterias/genética; Secuenciación de Nucleótidos de Alto Rendimiento/métodos; Metagenómica; Polimorfismo de Nucleótido Simple/genética; Análisis de Secuencia de ADN/métodos; Programas Informáticos; Algoritmos; Bacterias/clasificación; Biología Computacional; Humanos; Control de Calidad

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Bacterias / Programas Informáticos / Análisis de Secuencia de ADN / Polimorfismo de Nucleótido Simple / Metagenómica / Secuenciación de Nucleótidos de Alto Rendimiento Tipo de estudio: Diagnostic_studies / Prognostic_studies Límite: Humans Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2016 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Reino Unido

Texto completo

Añadir a Mi BVS

Imprimir

XML

PubMed Links

Buscar en Google