Your browser doesn't support javascript.
loading
PopIns: population-scale detection of novel sequence insertions.
Kehr, Birte; Melsted, Páll; Halldórsson, Bjarni V.
Affiliation
  • Kehr B; deCODE genetics/Amgen, Reykjavík, Iceland.
  • Melsted P; deCODE genetics/Amgen, Reykjavík, Iceland, Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavík, Iceland and.
  • Halldórsson BV; deCODE genetics/Amgen, Reykjavík, Iceland, Institute of Biomedical and Neural Engineering, Reykjavík University, Reykjavík, Iceland.
Bioinformatics ; 32(7): 961-7, 2016 04 01.
Article in En | MEDLINE | ID: mdl-25926346
ABSTRACT
MOTIVATION The detection of genomic structural variation (SV) has advanced tremendously in recent years due to progress in high-throughput sequencing technologies. Novel sequence insertions, insertions without similarity to a human reference genome, have received less attention than other types of SVs due to the computational challenges in their detection from short read sequencing data, which inherently involves de novo assembly. De novo assembly is not only computationally challenging, but also requires high-quality data. Although the reads from a single individual may not always meet this requirement, using reads from multiple individuals can increase power to detect novel insertions.

RESULTS:

We have developed the program PopIns, which can discover and characterize non-reference insertions of 100 bp or longer on a population scale. In this article, we describe the approach we implemented in PopIns. It takes as input a reads-to-reference alignment, assembles unaligned reads using a standard assembly tool, merges the contigs of different individuals into high-confidence sequences, anchors the merged sequences into the reference genome, and finally genotypes all individuals for the discovered insertions. Our tests on simulated data indicate that the merging step greatly improves the quality and reliability of predicted insertions and that PopIns shows significantly better recall and precision than the recent tool MindTheGap. Preliminary results on a dataset of 305 Icelanders demonstrate the practicality of the new approach. AVAILABILITY AND IMPLEMENTATION The source code of PopIns is available from http//github.com/bkehr/popins CONTACT birte.kehr@decode.is SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Sequence Analysis, DNA / Computational Biology / High-Throughput Nucleotide Sequencing Type of study: Diagnostic_studies / Prognostic_studies Limits: Humans Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2016 Type: Article Affiliation country: Iceland

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Sequence Analysis, DNA / Computational Biology / High-Throughput Nucleotide Sequencing Type of study: Diagnostic_studies / Prognostic_studies Limits: Humans Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2016 Type: Article Affiliation country: Iceland