Alignment of high-throughput sequencing data inside in-memory databases.

Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias.

Afiliación

Firnkorn D; Institute of Medical Biometry and Informatics, Heidelberg, Germany.
Knaup-Gregori P; Institute of Medical Biometry and Informatics, Heidelberg, Germany.
Lorenzo Bermejo J; Institute of Medical Biometry and Informatics, Heidelberg, Germany.
Ganzinger M; Institute of Medical Biometry and Informatics, Heidelberg, Germany.

Stud Health Technol Inform ; 205: 476-80, 2014.

Article en En | MEDLINE | ID: mdl-25160230

ABSTRACT

ABSTRACT

In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

Asunto(s)

ADN/genética; Sistemas de Administración de Bases de Datos; Bases de Datos Genéticas; Lenguajes de Programación; Alineación de Secuencia/métodos; Análisis de Secuencia de ADN/métodos; Programas Informáticos; Algoritmos; Secuencia de Bases; Almacenamiento y Recuperación de la Información/métodos; Datos de Secuencia Molecular

Buscar en Google

Imprimir

XML

PubMed Links

Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Sistemas de Administración de Bases de Datos / Lenguajes de Programación / Programas Informáticos / ADN / Alineación de Secuencia / Análisis de Secuencia de ADN / Bases de Datos Genéticas Idioma: En Revista: Stud Health Technol Inform Asunto de la revista: INFORMATICA MEDICA / PESQUISA EM SERVICOS DE SAUDE Año: 2014 Tipo del documento: Article País de afiliación: Alemania

Buscar en Google

Imprimir

XML

PubMed Links