Your browser doesn't support javascript.
loading
misFinder: identify mis-assemblies in an unbiased manner using reference and paired-end reads.
Zhu, Xiao; Leung, Henry C M; Wang, Rongjie; Chin, Francis Y L; Yiu, Siu Ming; Quan, Guangri; Li, Yajie; Zhang, Rui; Jiang, Qinghua; Liu, Bo; Dong, Yucui; Zhou, Guohui; Wang, Yadong.
Afiliação
  • Zhu X; College of Computer Sciences and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China. zhuxiao.hit@gmail.com.
  • Leung HC; Center for Bioinformatics, School of Computer Sciences and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China. zhuxiao.hit@gmail.com.
  • Wang R; Department of Computer Science, University of Hong Kong, Pokfulam Road, Hong Kong, China. cmleung2@cs.hku.hk.
  • Chin FY; Center for Bioinformatics, School of Computer Sciences and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China. rjwang.hit@gmail.com.
  • Yiu SM; Department of Computer Science, University of Hong Kong, Pokfulam Road, Hong Kong, China. chin@cs.hku.hk.
  • Quan G; Department of Computer Science, University of Hong Kong, Pokfulam Road, Hong Kong, China. smyiu@cs.hku.hk.
  • Li Y; Center for Bioinformatics, School of Computer Sciences and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China. grquan@hit.edu.cn.
  • Zhang R; The Fourth Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, China. 148077246@qq.com.
  • Jiang Q; The Fourth Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, China. 282661708@qq.com.
  • Liu B; School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China. qhjiang@hit.edu.cn.
  • Dong Y; Center for Bioinformatics, School of Computer Sciences and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China. bo.liu@hit.edu.cn.
  • Zhou G; Department of Immunology, Harbin Medical University, Harbin, Heilongjiang, China. dongyucui521@yeah.net.
  • Wang Y; College of Computer Sciences and Information Engineering, Harbin Normal University, Harbin, Heilongjiang, China. zhou_ghui@163.com.
BMC Bioinformatics ; 16: 386, 2015 Nov 16.
Article em En | MEDLINE | ID: mdl-26573684
ABSTRACT

BACKGROUND:

Because of the short read length of high throughput sequencing data, assembly errors are introduced in genome assembly, which may have adverse impact to the downstream data analysis. Several tools have been developed to eliminate these errors by either 1) comparing the assembled sequences with some similar reference genome, or 2) analyzing paired-end reads aligned to the assembled sequences and determining inconsistent features alone mis-assembled sequences. However, the former approach cannot distinguish real structural variations between the target genome and the reference genome while the latter approach could have many false positive detections (correctly assembled sequence being considered as mis-assembled sequence).

RESULTS:

We present misFinder, a tool that aims to identify the assembly errors with high accuracy in an unbiased way and correct these errors at their mis-assembled positions to improve the assembly accuracy for downstream analysis. It combines the information of reference (or close related reference) genome and aligned paired-end reads to the assembled sequence. Assembly errors and correct assemblies corresponding to structural variations can be detected by comparing the genome reference and assembled sequence. Different types of assembly errors can then be distinguished from the mis-assembled sequence by analyzing the aligned paired-end reads using multiple features derived from coverage and consistence of insert distance to obtain high confident error calls.

CONCLUSIONS:

We tested the performance of misFinder on both simulated and real paired-end reads data, and misFinder gave accurate error calls with only very few miscalls. And, we further compared misFinder with QUAST and REAPR. misFinder outperformed QUAST and REAPR by 1) identified more true positive mis-assemblies with very few false positives and false negatives, and 2) distinguished the correct assemblies corresponding to structural variations from mis-assembled sequence. misFinder can be freely downloaded from https//github.com/hitbio/misFinder.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Schizosaccharomyces / Software / Análise de Sequência de DNA / Escherichia coli / Sequenciamento de Nucleotídeos em Larga Escala Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2015 Tipo de documento: Article País de afiliação: China

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Schizosaccharomyces / Software / Análise de Sequência de DNA / Escherichia coli / Sequenciamento de Nucleotídeos em Larga Escala Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2015 Tipo de documento: Article País de afiliação: China