Your browser doesn't support javascript.
loading
An Integer Linear Programming Approach for Scaffolding Based on Exemplar Breakpoint Distance.
Shieh, Yi-Kung; Peng, Dao-Yuan; Chen, Yu-Han; Wu, Tsung-Wei; Lu, Chin Lung.
Afiliação
  • Shieh YK; Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
  • Peng DY; Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
  • Chen YH; Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
  • Wu TW; Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
  • Lu CL; Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan.
J Comput Biol ; 29(9): 961-973, 2022 09.
Article em En | MEDLINE | ID: mdl-35638936
ABSTRACT
Reference-based scaffolding is an important process used in genomic sequencing to order and orient the contigs in a draft genome based on a reference genome. In this study, we utilize the concept of genome rearrangement to formulate this process as an exemplar breakpoint distance (EBD)-based scaffolding problem, whose aim is to scaffold the contigs of two given draft genomes, both containing duplicate genes (or sequence markers) and acting with each other as a reference, such that the EBD between the scaffolded genomes is minimized. The EBD-based scaffolding problem is difficult to solve because it is non-deterministic polynomial-time (NP)-hard. In this work, we design an integer linear programming (ILP)-based algorithm to exactly solve the EBD-based scaffolding problem. Our experimental results on both simulated and biological data sets show that our ILP-based scaffolding algorithm can accurately and efficiently use a reference genome to scaffold the contigs of a draft genome. Moreover, our ILP-based scaffolding algorithm with considering duplicate genes indeed has better accuracy performance than that without considering duplicate genes, suggesting that duplicate genes and their exemplars are helpful for the application of genome rearrangement in the study of the reference-based scaffolding problem. When compared with RaGOO, a current state-of-the-art alignment-based scaffolder, our ILP-based scaffolding algorithm still has better accuracy performance on the biological data sets.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Programação Linear / Genoma Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Programação Linear / Genoma Idioma: En Ano de publicação: 2022 Tipo de documento: Article