Your browser doesn't support javascript.
loading
Genome sequence assembly algorithms and misassembly identification methods.
Meng, Yue; Lei, Yu; Gao, Jianlong; Liu, Yuxuan; Ma, Enze; Ding, Yunhong; Bian, Yixin; Zu, Hongquan; Dong, Yucui; Zhu, Xiao.
Afiliação
  • Meng Y; School of Information Engineering, Zhengzhou University of Industrial Technology, Zhengzhou, Henan, China.
  • Lei Y; Department of Big Data and Intelligent Engineering, Shanxi Institute of Technology, Yangquan, Shanxi, China.
  • Gao J; School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China.
  • Liu Y; School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China.
  • Ma E; School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China.
  • Ding Y; School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China.
  • Bian Y; School of Computer Science and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China.
  • Zu H; Center of Network and Information, Harbin Institute of Technology, Harbin, Heilongjiang, China.
  • Dong Y; Department of Immunology, Binzhou Medical University, Yantai, Shandong, China. dongyucui521@yeah.net.
  • Zhu X; School of Computer and Control Engineering, Yantai University, Yantai, Shandong, China. zhuxiao_hit@yeah.net.
Mol Biol Rep ; 49(11): 11133-11148, 2022 Nov.
Article em En | MEDLINE | ID: mdl-36151399
ABSTRACT
The sequence assembly algorithms have rapidly evolved with the vigorous growth of genome sequencing technology over the past two decades. Assembly mainly uses the iterative expansion of overlap relationships between sequences to construct the target genome. The assembly algorithms can be typically classified into several categories, such as the Greedy strategy, Overlap-Layout-Consensus (OLC) strategy, and de Bruijn graph (DBG) strategy. In particular, due to the rapid development of third-generation sequencing (TGS) technology, some prevalent assembly algorithms have been proposed to generate high-quality chromosome-level assemblies. However, due to the genome complexity, the length of short reads, and the high error rate of long reads, contigs produced by assembly may contain misassemblies adversely affecting downstream data analysis. Therefore, several read-based and reference-based methods for misassembly identification have been developed to improve assembly quality. This work primarily reviewed the development of DNA sequencing technologies and summarized sequencing data simulation methods, sequencing error correction methods, various mainstream sequence assembly algorithms, and misassembly identification methods. A large amount of computation makes the sequence assembly problem more challenging, and therefore, it is necessary to develop more efficient and accurate assembly algorithms and alternative algorithms.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Genoma Tipo de estudo: Diagnostic_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Genoma Tipo de estudo: Diagnostic_studies Idioma: En Ano de publicação: 2022 Tipo de documento: Article