Genome sequence assembly algorithms and misassembly identification methods.

Meng, Yue; Lei, Yu; Gao, Jianlong; Liu, Yuxuan; Ma, Enze; Ding, Yunhong; Bian, Yixin; Zu, Hongquan; Dong, Yucui; Zhu, Xiao

Meng, Yue; Lei, Yu; Gao, Jianlong; Liu, Yuxuan; Ma, Enze; Ding, Yunhong; Bian, Yixin; Zu, Hongquan; Dong, Yucui; Zhu, Xiao.

Afiliação

Meng Y; School of Information Engineering, Zhengzhou University of Industrial Technology, Zhengzhou, Henan, China.
Lei Y; Department of Big Data and Intelligent Engineering, Shanxi Institute of Technology, Yangquan, Shanxi, China.
Gao J; School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China.
Liu Y; School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China.
Ma E; School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China.
Ding Y; School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China.
Bian Y; School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China.
Zu H; Center of Network and Information, Harbin Institute of Technology, Harbin, Heilongjiang, China.
Dong Y; Department of Immunology, Binzhou Medical University, Yantai, Shandong, China. dongyucui521@yeah.net.
Zhu X; School of Computer and Control Engineering, Yantai University, Yantai, Shandong, China. zhuxiao_hit@yeah.net.

Mol Biol Rep ; 49(11): 11133-11148, 2022 Nov.

Article em En | MEDLINE | ID: mdl-36151399

ABSTRACT

ABSTRACT

The sequence assembly algorithms have rapidly evolved with the vigorous growth of genome sequencing technology over the past two decades. Assembly mainly uses the iterative expansion of overlap relationships between sequences to construct the target genome. The assembly algorithms can be typically classified into several categories, such as the Greedy strategy, Overlap-Layout-Consensus (OLC) strategy, and de Bruijn graph (DBG) strategy. In particular, due to the rapid development of third-generation sequencing (TGS) technology, some prevalent assembly algorithms have been proposed to generate high-quality chromosome-level assemblies. However, due to the genome complexity, the length of short reads, and the high error rate of long reads, contigs produced by assembly may contain misassemblies adversely affecting downstream data analysis. Therefore, several read-based and reference-based methods for misassembly identification have been developed to improve assembly quality. This work primarily reviewed the development of DNA sequencing technologies and summarized sequencing data simulation methods, sequencing error correction methods, various mainstream sequence assembly algorithms, and misassembly identification methods. A large amount of computation makes the sequence assembly problem more challenging, and therefore, it is necessary to develop more efficient and accurate assembly algorithms and alternative algorithms.

Assuntos

Algoritmos; Genoma; Análise de Sequência de DNA/métodos; Sequência de Bases; Sequenciamento de Nucleotídeos em Larga Escala/métodos; Software

Palavras-chave

Genome assembly algorithms; Genome sequencing technology; Misassembly identification methods; Third-generation sequencing

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Genoma Tipo de estudo: Diagnostic_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google