Your browser doesn't support javascript.
loading
Benchmarking short and long read polishing tools for nanopore assemblies: achieving near-perfect genomes for outbreak isolates.
Luan, Tu; Commichaux, Seth; Hoffmann, Maria; Jayeola, Victor; Jang, Jae Hee; Pop, Mihai; Rand, Hugh; Luo, Yan.
Afiliación
  • Luan T; Department of Computer Science, University of Maryland, College Park, MD, 20742, USA.
  • Commichaux S; Center for Food Safety and Applied Nutrition, Food and Drug Administration, Laurel, MD, 20708, USA. Seth.Commichaux@fda.hhs.gov.
  • Hoffmann M; Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA.
  • Jayeola V; Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA.
  • Jang JH; Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA.
  • Pop M; Department of Computer Science, University of Maryland, College Park, MD, 20742, USA.
  • Rand H; Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA.
  • Luo Y; Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA.
BMC Genomics ; 25(1): 679, 2024 Jul 08.
Article en En | MEDLINE | ID: mdl-38978005
ABSTRACT

BACKGROUND:

Oxford Nanopore provides high throughput sequencing platforms able to reconstruct complete bacterial genomes with 99.95% accuracy. However, even small levels of error can obscure the phylogenetic relationships between closely related isolates. Polishing tools have been developed to correct these errors, but it is uncertain if they obtain the accuracy needed for the high-resolution source tracking of foodborne illness outbreaks.

RESULTS:

We tested 132 combinations of assembly and short- and long-read polishing tools to assess their accuracy for reconstructing the genome sequences of 15 highly similar Salmonella enterica serovar Newport isolates from a 2020 onion outbreak. While long-read polishing alone improved accuracy, near perfect accuracy (99.9999% accuracy or ~ 5 nucleotide errors across the 4.8 Mbp genome, excluding low confidence regions) was only obtained by pipelines that combined both long- and short-read polishing tools. Notably, medaka was a more accurate and efficient long-read polisher than Racon. Among short-read polishers, NextPolish showed the highest accuracy, but Pilon, Polypolish, and POLCA performed similarly. Among the 5 best performing pipelines, polishing with medaka followed by NextPolish was the most common combination. Importantly, the order of polishing tools mattered i.e., using less accurate tools after more accurate ones introduced errors. Indels in homopolymers and repetitive regions, where the short reads could not be uniquely mapped, remained the most challenging errors to correct.

CONCLUSIONS:

Short reads are still needed to correct errors in nanopore sequenced assemblies to obtain the accuracy required for source tracking investigations. Our granular assessment of the performance of the polishing pipelines allowed us to suggest best practices for tool users and areas for improvement for tool developers.
Asunto(s)
Palabras clave

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Brotes de Enfermedades / Genoma Bacteriano / Benchmarking Límite: Humans Idioma: En Revista: BMC Genomics Asunto de la revista: GENETICA Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Reino Unido

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Brotes de Enfermedades / Genoma Bacteriano / Benchmarking Límite: Humans Idioma: En Revista: BMC Genomics Asunto de la revista: GENETICA Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos Pais de publicación: Reino Unido