Pesquisa | Biblioteca Virtual em Saúde

Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.

Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel.

Proceedings VLDB Endowment ; 6(11)2013 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-24187650

RESUMO

Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.

Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.

Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng.

Proc ACM SIGSPATIAL Int Conf Adv Inf ; 2013: 528-531, 2013 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-27617325

RESUMO

The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.

Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems.

Wang, Kaibo; Huai, Yin; Lee, Rubao; Wang, Fusheng; Zhang, Xiaodong; Saltz, Joel H.

Proceedings VLDB Endowment ; 5(11): 1543-1554, 2012 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-23355955

RESUMO

As an important application of spatial databases in pathology imaging analysis, cross-comparing the spatial boundaries of a huge amount of segmented micro-anatomic objects demands extremely data- and compute-intensive operations, requiring high throughput at an affordable cost. However, the performance of spatial database systems has not been satisfactory since their implementations of spatial operations cannot fully utilize the power of modern parallel hardware. In this paper, we provide a customized software solution that exploits GPUs and multi-core CPUs to accelerate spatial cross-comparison in a cost-effective way. Our solution consists of an efficient GPU algorithm and a pipelined system framework with task migration support. Extensive experiments with real-world data sets demonstrate the effectiveness of our solution, which improves the performance of spatial cross-comparison by over 18 times compared with a parallelized spatial database approach.

Towards Building High Performance Medical Image Management System for Clinical Trials.

Wang, Fusheng; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel.

Proc SPIE Int Soc Opt Eng ; 7967(79670C)2011.

Artigo em Inglês | MEDLINE | ID: mdl-21603096

RESUMO

Medical image based biomarkers are being established for therapeutic cancer clinical trials, where image assessment is among the essential tasks. Large scale image assessment is often performed by a large group of experts by retrieving images from a centralized image repository to workstations to markup and annotate images. In such environment, it is critical to provide a high performance image management system that supports efficient concurrent image retrievals in a distributed environment. There are several major challenges: high throughput of large scale image data over the Internet from the server for multiple concurrent client users, efficient communication protocols for transporting data, and effective management of versioning of data for audit trails. We study the major bottlenecks for such a system, propose and evaluate a solution by using a hybrid image storage with solid state drives and hard disk drives, RESTful Web Services based protocols for exchanging image data, and a database based versioning scheme for efficient archive of image revision history. Our experiments show promising results of our methods, and our work provides a guideline for building enterprise level high performance medical image management systems.

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA