Búsqueda | Portal Regional de la BVS

Bioinformatics Workflows With NoSQL Database in Cloud Computing.

Wercelens, Polyane; da Silva, Waldeyr; Hondo, Fernanda; Castro, Klayton; Walter, Maria Emília; Araújo, Aletéia; Lifschitz, Sergio; Holanda, Maristela.

Evol Bioinform Online ; 15: 1176934319889974, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-31839702

RESUMEN

Scientific workflows can be understood as arrangements of managed activities executed by different processing entities. It is a regular Bioinformatics approach applying workflows to solve problems in Molecular Biology, notably those related to sequence analyses. Due to the nature of the raw data and the in silico environment of Molecular Biology experiments, apart from the research subject, 2 practical and closely related problems have been studied: reproducibility and computational environment. When aiming to enhance the reproducibility of Bioinformatics experiments, various aspects should be considered. The reproducibility requirements comprise the data provenance, which enables the acquisition of knowledge about the trajectory of data over a defined workflow, the settings of the programs, and the entire computational environment. Cloud computing is a booming alternative that can provide this computational environment, hiding technical details, and delivering a more affordable, accessible, and configurable on-demand environment for researchers. Considering this specific scenario, we proposed a solution to improve the reproducibility of Bioinformatics workflows in a cloud computing environment using both Infrastructure as a Service (IaaS) and Not only SQL (NoSQL) database systems. To meet the goal, we have built 3 typical Bioinformatics workflows and ran them on 1 private and 2 public clouds, using different types of NoSQL database systems to persist the provenance data according to the Provenance Data Model (PROV-DM). We present here the results and a guide for the deployment of a cloud environment for Bioinformatics exploring the characteristics of various NoSQL database systems to persist provenance data.

Distributed Data Service for Data Management in Internet of Things Middleware.

Cruz Huacarpuma, Ruben; de Sousa Junior, Rafael Timoteo; de Holanda, Maristela Terto; de Oliveira Albuquerque, Robson; García Villalba, Luis Javier; Kim, Tai-Hoon.

Sensors (Basel) ; 17(5)2017 Apr 27.

Artículo en Inglés | MEDLINE | ID: mdl-28448469

RESUMEN

The development of the Internet of Things (IoT) is closely related to a considerable increase in the number and variety of devices connected to the Internet. Sensors have become a regular component of our environment, as well as smart phones and other devices that continuously collect data about our lives even without our intervention. With such connected devices, a broad range of applications has been developed and deployed, including those dealing with massive volumes of data. In this paper, we introduce a Distributed Data Service (DDS) to collect and process data for IoT environments. One central goal of this DDS is to enable multiple and distinct IoT middleware systems to share common data services from a loosely-coupled provider. In this context, we propose a new specification of functionalities for a DDS and the conception of the corresponding techniques for collecting, filtering and storing data conveniently and efficiently in this environment. Another contribution is a data aggregation component that is proposed to support efficient real-time data querying. To validate its data collecting and querying functionalities and performance, the proposed DDS is evaluated in two case studies regarding a simulated smart home system, the first case devoted to evaluating data collection and aggregation when the DDS is interacting with the UIoT middleware, and the second aimed at comparing the DDS data collection with this same functionality implemented within the Kaa middleware.

Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.

Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio.

Int J Genomics ; 2015: 502795, 2015.

Artículo en Inglés | MEDLINE | ID: mdl-26558254

RESUMEN

Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.

Provenance in bioinformatics workflows.

de Paula, Renato; Holanda, Maristela; Gomes, Luciana S A; Lifschitz, Sergio; Walter, Maria Emilia M T.

BMC Bioinformatics ; 14 Suppl 11: S6, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-24564294

RESUMEN

In this work, we used the PROV-DM model to manage data provenance in workflows of genome projects. This provenance model allows the storage of details of one workflow execution, e.g., raw and produced data and computational tools, their versions and parameters. Using this model, biologists can access details of one particular execution of a workflow, compare results produced by different executions, and plan new experiments more efficiently. In addition to this, a provenance simulator was created, which facilitates the inclusion of provenance data of one genome project workflow execution. Finally, we discuss one case study, which aims to identify genes involved in specific metabolic pathways of Bacillus cereus, as well as to compare this isolate with other phylogenetic related bacteria from the Bacillus group. B. cereus is an extremophilic bacteria, collected in warm water in the Midwestern Region of Brazil, its DNA samples having been sequenced with an NGS machine.

Asunto(s)

Biología Computacional/métodos , Programas Informáticos , Bacillus cereus/genética , Genoma , Flujo de Trabajo

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA