SparkBLAST: scalable BLAST processing using in-memory operations.
BMC Bioinformatics
; 18(1): 318, 2017 Jun 27.
Article
en En
| MEDLINE
| ID: mdl-28655296
ABSTRACT
BACKGROUND:
The demand for processing ever increasing amounts of genomic data has raised new challenges for the implementation of highly scalable and efficient computational systems. In this paper we propose SparkBLAST, a parallelization of a sequence alignment application (BLAST) that employs cloud computing for the provisioning of computational resources and Apache Spark as the coordination framework. As a proof of concept, some radionuclide-resistant bacterial genomes were selected for similarity analysis.RESULTS:
Experiments in Google and Microsoft Azure clouds demonstrated that SparkBLAST outperforms an equivalent system implemented on Hadoop in terms of speedup and execution times.CONCLUSIONS:
The superior performance of SparkBLAST is mainly due to the in-memory operations available through the Spark framework, consequently reducing the number of local I/O operations required for distributed BLAST processing.Palabras clave
Texto completo:
1
Colección:
01-internacional
Banco de datos:
MEDLINE
Asunto principal:
Programas Informáticos
Idioma:
En
Revista:
BMC Bioinformatics
Asunto de la revista:
INFORMATICA MEDICA
Año:
2017
Tipo del documento:
Article
País de afiliación:
Brasil