Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine.

Bao, Shunxing; Weitendorf, Frederick D; Plassard, Andrew J; Huo, Yuankai; Gokhale, Aniruddha; Landman, Bennett A

Bao, Shunxing; Weitendorf, Frederick D; Plassard, Andrew J; Huo, Yuankai; Gokhale, Aniruddha; Landman, Bennett A.

Afiliação

Bao S; Computer Science, Vanderbilt University, Nashville, TN, USA 37235.
Weitendorf FD; Computer Science, Vanderbilt University, Nashville, TN, USA 37235.
Plassard AJ; Computer Science, Vanderbilt University, Nashville, TN, USA 37235.
Huo Y; Electrical Engineering, Vanderbilt University, Nashville, TN, USA 37235.
Gokhale A; Computer Science, Vanderbilt University, Nashville, TN, USA 37235.
Landman BA; Electrical Engineering, Vanderbilt University, Nashville, TN, USA 37235.

Proc SPIE Int Soc Opt Eng ; 101382017 Feb 11.

Article em En | MEDLINE | ID: mdl-28736473

RESUMO

The field of big data is generally concerned with the scale of processing at which traditional computational paradigms break down. In medical imaging, traditional large scale processing uses a cluster computer that combines a group of workstation nodes into a functional unit that is controlled by a job scheduler. Typically, a shared-storage network file system (NFS) is used to host imaging data. However, data transfer from storage to processing nodes can saturate network bandwidth when data is frequently uploaded/retrieved from the NFS, e.g., "short" processing times and/or "large" datasets. Recently, an alternative approach using Hadoop and HBase was presented for medical imaging to enable co-location of data storage and computation while minimizing data transfer. The benefits of using such a framework must be formally evaluated against a traditional approach to characterize the point at which simply "large scale" processing transitions into "big data" and necessitates alternative computational frameworks. The proposed Hadoop system was implemented on a production lab-cluster alongside a standard Sun Grid Engine (SGE). Theoretical models for wall-clock time and resource time for both approaches are introduced and validated. To provide real example data, three T1 image archives were retrieved from a university secure, shared web database and used to empirically assess computational performance under three configurations of cluster hardware (using 72, 109, or 209 CPU cores) with differing job lengths. Empirical results match the theoretical models. Based on these data, a comparative analysis is presented for when the Hadoop framework will be relevant and non-relevant for medical imaging.

Palavras-chave

Apache Hadoop; Sun Grid Engine; Verification

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2017 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links