SparkBLAST: scalable BLAST processing using in-memory operations.

de Castro, Marcelo Rodrigo; Tostes, Catherine Dos Santos; Dávila, Alberto M R; Senger, Hermes; da Silva, Fabricio A B

de Castro, Marcelo Rodrigo; Tostes, Catherine Dos Santos; Dávila, Alberto M R; Senger, Hermes; da Silva, Fabricio A B.

Afiliación

de Castro MR; Computer Science Department, Federal University of São Carlos, Rod. Washington Luís, Km 235, São Carlos, 21040-900, Brazil.
Tostes CDS; LBCS-IOC, Oswaldo Cruz Foundation, Av Brasil 4365, Rio de Janeiro, 21040-900, Brazil.
Dávila AMR; LBCS-IOC, Oswaldo Cruz Foundation, Av Brasil 4365, Rio de Janeiro, 21040-900, Brazil.
Senger H; Computer Science Department, Federal University of São Carlos, Rod. Washington Luís, Km 235, São Carlos, 21040-900, Brazil.
da Silva FAB; PROCC, Oswaldo Cruz Foundation, Av. Brasil 4365, Rio de Janeiro, 21040-900, Brazil. fabricio.silva@fiocruz.br.

BMC Bioinformatics ; 18(1): 318, 2017 Jun 27.

Article en En | MEDLINE | ID: mdl-28655296

ABSTRACT

ABSTRACT

BACKGROUND:

The demand for processing ever increasing amounts of genomic data has raised new challenges for the implementation of highly scalable and efficient computational systems. In this paper we propose SparkBLAST, a parallelization of a sequence alignment application (BLAST) that employs cloud computing for the provisioning of computational resources and Apache Spark as the coordination framework. As a proof of concept, some radionuclide-resistant bacterial genomes were selected for similarity analysis.

RESULTS:

Experiments in Google and Microsoft Azure clouds demonstrated that SparkBLAST outperforms an equivalent system implemented on Hadoop in terms of speedup and execution times.

CONCLUSIONS:

The superior performance of SparkBLAST is mainly due to the in-memory operations available through the Spark framework, consequently reducing the number of local I/O operations required for distributed BLAST processing.

Asunto(s)

Programas Informáticos; Algoritmos; Nube Computacional; Hibridación Genómica Comparativa; Bases de Datos Factuales; Alineación de Secuencia

Palabras clave

Cloud computing; Comparative genomics; Scalability; Spark

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Programas Informáticos Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2017 Tipo del documento: Article País de afiliación: Brasil

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google