Your browser doesn't support javascript.
loading
BlockPolish: accurate polishing of long-read assembly via block divide-and-conquer.
Huang, Neng; Nie, Fan; Ni, Peng; Gao, Xin; Luo, Feng; Wang, Jianxin.
Afiliação
  • Huang N; School of Computer Science and Engineering, Central South University, China.
  • Nie F; School of Computer Science and Engineering, Central South University, China.
  • Ni P; School of Computer Science and Engineering, Central South University, China.
  • Gao X; School of Computer Science, King Abdullah University of Science and Technology, Saudi Arabia.
  • Luo F; School of Computing, Clemson University, USA.
  • Wang J; School of Computer Science and Engineering, Central South University, China.
Brief Bioinform ; 23(1)2022 01 17.
Article em En | MEDLINE | ID: mdl-34619757
ABSTRACT
Long-read sequencing technology enables significant progress in de novo genome assembly. However, the high error rate and the wide error distribution of raw reads result in a large number of errors in the assembly. Polishing is a procedure to fix errors in the draft assembly and improve the reliability of genomic analysis. However, existing methods treat all the regions of the assembly equally while there are fundamental differences between the error distributions of these regions. How to achieve very high accuracy in genome assembly is still a challenging problem. Motivated by the uneven errors in different regions of the assembly, we propose a novel polishing workflow named BlockPolish. In this method, we divide contigs into blocks with low complexity and high complexity according to statistics of aligned nucleotide bases. Multiple sequence alignment is applied to realign raw reads in complex blocks and optimize the alignment result. Due to the different distributions of error rates in trivial and complex blocks, two multitask bidirectional Long short-term memory (LSTM) networks are proposed to predict the consensus sequences. In the whole-genome assemblies of NA12878 assembled by Wtdbg2 and Flye using Nanopore data, BlockPolish has a higher polishing accuracy than other state-of-the-arts including Racon, Medaka and MarginPolish & HELEN. In all assemblies, errors are predominantly indels and BlockPolish has a good performance in correcting them. In addition to the Nanopore assemblies, we further demonstrate that BlockPolish can also reduce the errors in the PacBio assemblies. The source code of BlockPolish is freely available on Github (https//github.com/huangnengCSU/BlockPolish).
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Sequenciamento de Nucleotídeos em Larga Escala Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Sequenciamento de Nucleotídeos em Larga Escala Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article