Your browser doesn't support javascript.
loading
TranSurVeyor: an improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data.
Rajaby, Ramesh; Sung, Wing-Kin.
Afiliación
  • Rajaby R; School of Computing, National University of Singapore, 13 Computing Drive, 117417, Singapore.
  • Sung WK; NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, 28 Medical Drive, 117456, Singapore.
Nucleic Acids Res ; 46(20): e122, 2018 11 16.
Article en En | MEDLINE | ID: mdl-30137425
ABSTRACT
Transpositions transfer DNA segments between different loci within a genome; in particular, when a transposition is found in a sample but not in a reference genome, it is called a non-reference transposition. They are important structural variations that have clinical impact. Transpositions can be called by analyzing second generation high-throughput sequencing datasets. Current methods follow either a database-based or a database-free approach. Database-based methods require a database of transposable elements. Some of them have good specificity; however this approach cannot detect novel transpositions, and it requires a good database of transposable elements, which is not yet available for many species. Database-free methods perform de novo calling of transpositions, but their accuracy is low. We observe that this is due to the misalignment of the reads; since reads are short and the human genome has many repeats, false alignments create false positive predictions while missing alignments reduce the true positive rate. This paper proposes new techniques to improve database-free non-reference transposition calling first, we propose a realignment strategy called one-end remapping that corrects the alignments of reads in interspersed repeats; second, we propose a SNV-aware filter that removes some incorrectly aligned reads. By combining these two techniques and other techniques like clustering and positive-to-negative ratio filter, our proposed transposition caller TranSurVeyor shows at least 3.1-fold improvement in terms of F1-score over existing database-free methods. More importantly, even though TranSurVeyor does not use databases of prior information, its performance is at least as good as existing database-based methods such as MELT, Mobster and Retroseq. We also illustrate that TranSurVeyor can discover transpositions that are not known in the current database.
Asunto(s)

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Algoritmos / Elementos Transponibles de ADN / Bases de Datos Factuales / Biología Computacional / Secuenciación de Nucleótidos de Alto Rendimiento Tipo de estudio: Diagnostic_studies / Prognostic_studies Límite: Humans Idioma: En Revista: Nucleic Acids Res Año: 2018 Tipo del documento: Article País de afiliación: Singapur

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Algoritmos / Elementos Transponibles de ADN / Bases de Datos Factuales / Biología Computacional / Secuenciación de Nucleótidos de Alto Rendimiento Tipo de estudio: Diagnostic_studies / Prognostic_studies Límite: Humans Idioma: En Revista: Nucleic Acids Res Año: 2018 Tipo del documento: Article País de afiliación: Singapur