Búsqueda | Portal de Búsqueda de la BVS España

FESTIval: A versatile framework for conducting experimental evaluations of spatial indices.

Carniel, Anderson C; Ciferri, Ricardo R; Ciferri, Cristina D A.

MethodsX ; 7: 100695, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-32021811

RESUMEN

The use of a spatial index is a common strategy to improve the performance of spatial queries in spatial database systems and Geographic Information Systems. Choosing the right spatial index to be employed in a given context requires a quantitative method to analyze the performance of spatial indices. This is done through extensive experimental evaluations. However, conducting these evaluations is an expensive, error-prone, and challenging task because (i) spatial objects are complex data to manage, (ii) spatial indices can apply different parameter values and thus assume distinct configurations, and (iii) there are indices specifically developed for different storage systems, such as disks and flash memories. In this article, we propose FESTIval, a versatile framework for conducting experimental evaluations of spatial indices. FESTIval has the following main advantages: â¢the support for different types of disk-based and flash-aware spatial indices;â¢the specification and execution of user-defined workloads;â¢the use of a data schema that stores index configurations and statistical data of executed workloads. Because of its characteristics, FESTIval allows users to reproduce executed experiments. Further, FESTIval provides an extensible environment, where any spatial dataset can be handled by spatial indices. FESTIval has been used to validate new proposals of flash-aware spatial indices, such as eFIND-based indices.

Random access with a distributed Bitmap Join Index for Star Joins.

Brito, Jaqueline J; Mosqueiro, Thiago; Ciferri, Ricardo R; Ciferri, Cristina D A.

Heliyon ; 6(2): e03342, 2020 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-32099915

RESUMEN

Indices improve the performance of relational databases, especially on queries that return a small portion of the data (i.e., low-selectivity queries). Star joins are particularly expensive operations that commonly rely on indices for improved performance at scale. The development and support of index-based solutions for Star Joins are still at very early stages. To address this gap, we propose a distributed Bitmap Join Index (dBJI) and a framework-agnostic strategy to solve join predicates in linear time. For empirical analysis, we used common Hadoop technologies (e.g., HBase and Spark) to show that dBJI significantly outperforms full scan approaches by a factor between 59% and 88% in queries with low selectivity from the Star Schema Benchmark (SSB). Thus, distributed indices may significantly enhance low-selectivity query performance even in very large databases.

Generalized enhanced suffix array construction in external memory.

Louza, Felipe A; Telles, Guilherme P; Hoffmann, Steve; Ciferri, Cristina D A.

Algorithms Mol Biol ; 12: 26, 2017.

Artículo en Inglés | MEDLINE | ID: mdl-29234460

RESUMEN

BACKGROUND: Suffix arrays, augmented by additional data structures, allow solving efficiently many string processing problems. The external memory construction of the generalized suffix array for a string collection is a fundamental task when the size of the input collection or the data structure exceeds the available internal memory. RESULTS: In this article we present and analyze [Formula: see text] [introduced in CPM (External memory generalized suffix and [Formula: see text] arrays construction. In: Proceedings of CPM. pp 201-10, 2013)], the first external memory algorithm to construct generalized suffix arrays augmented with the longest common prefix array for a string collection. Our algorithm relies on a combination of buffers, induced sorting and a heap to avoid direct string comparisons. We performed experiments that covered different aspects of our algorithm, including running time, efficiency, external memory access, internal phases and the influence of different optimization strategies. On real datasets of size up to 24 GB and using 2 GB of internal memory, [Formula: see text] showed a competitive performance when compared to [Formula: see text] and [Formula: see text], which are efficient algorithms for a single string according to the related literature. We also show the effect of disk caching managed by the operating system on our algorithm. CONCLUSIONS: The proposed algorithm was validated through performance tests using real datasets from different domains, in various combinations, and showed a competitive performance. Our algorithm can also construct the generalized Burrows-Wheeler transform of a string collection with no additional cost except by the output time.

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA