Pesquisa | Portal Regional da BVS

Smart Data Placement Using Storage-as-a-Service Model for Big Data Pipelines.

Khan, Akif Quddus; Nikolov, Nikolay; Matskin, Mihhail; Prodan, Radu; Roman, Dumitru; Sahin, Bekir; Bussler, Christoph; Soylu, Ahmet.

Sensors (Basel) ; 23(2)2023 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-36679360

RESUMO

Big data pipelines are developed to process data characterized by one or more of the three big data features, commonly known as the three Vs (volume, velocity, and variety), through a series of steps (e.g., extract, transform, and move), making the ground work for the use of advanced analytics and ML/AI techniques. Computing continuum (i.e., cloud/fog/edge) allows access to virtually infinite amount of resources, where data pipelines could be executed at scale; however, the implementation of data pipelines on the continuum is a complex task that needs to take computing resources, data transmission channels, triggers, data transfer methods, integration of message queues, etc., into account. The task becomes even more challenging when data storage is considered as part of the data pipelines. Local storage is expensive, hard to maintain, and comes with several challenges (e.g., data availability, data security, and backup). The use of cloud storage, i.e., storage-as-a-service (StaaS), instead of local storage has the potential of providing more flexibility in terms of scalability, fault tolerance, and availability. In this article, we propose a generic approach to integrate StaaS with data pipelines, i.e., computation on an on-premise server or on a specific cloud, but integration with StaaS, and develop a ranking method for available storage options based on five key parameters: cost, proximity, network performance, server-side encryption, and user weights/preferences. The evaluation carried out demonstrates the effectiveness of the proposed approach in terms of data transfer performance, utility of the individual parameters, and feasibility of dynamic selection of a storage option based on four primary user scenarios.

Assuntos

Algoritmos , Big Data , Software , Computadores , Segurança Computacional

Big Data Workflows: Locality-Aware Orchestration Using Software Containers.

Corodescu, Andrei-Alin; Nikolov, Nikolay; Khan, Akif Quddus; Soylu, Ahmet; Matskin, Mihhail; Payberah, Amir H; Roman, Dumitru.

Sensors (Basel) ; 21(24)2021 Dec 08.

Artigo em Inglês | MEDLINE | ID: mdl-34960302

RESUMO

The emergence of the edge computing paradigm has shifted data processing from centralised infrastructures to heterogeneous and geographically distributed infrastructures. Therefore, data processing solutions must consider data locality to reduce the performance penalties from data transfers among remote data centres. Existing big data processing solutions provide limited support for handling data locality and are inefficient in processing small and frequent events specific to the edge environments. This article proposes a novel architecture and a proof-of-concept implementation for software container-centric big data workflow orchestration that puts data locality at the forefront. The proposed solution considers the available data locality information, leverages long-lived containers to execute workflow steps, and handles the interaction with different data sources through containers. We compare the proposed solution with Argo workflows and demonstrate a significant performance improvement in the execution speed for processing the same data units. Finally, we carry out experiments with the proposed solution under different configurations and analyze individual aspects affecting the performance of the overall solution.

Assuntos

Big Data , Biologia Computacional , Armazenamento e Recuperação da Informação , Software , Fluxo de Trabalho

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA