SC is the International Conference for
High Performance Computing, Networking,
Storage and Analysis

SCHEDULE: NOV 12-18, 2011

ISABELA-QA: Query-driven Data Analytics over ISABELA-compressed Extreme-Scale Scientific Data

SESSION: Querying Large Scale Data


TIME: 4:30PM - 5:00PM

AUTHOR(S):Sriram Lakshminarasimhan, Jonathan Jenkins, Robert Latham, Robert Ross, Nagiza F. Samatova, Isha Arkatkar, Zhenhuan Gong, Hemanth Kolla, Jackie Chen, Seung-Hoe Ku, C.S. Chang, Stephane Ethier, Scott Klasky


We present a query processing engine for scientific data based on ISABELA, a partitioned B-spline-lossy-compression scheme. We optimize spatial region and variable queries on variable and temporal constraints by performing temporal-delimited binning on the range of the variable values, ensuring near-uniform distribution of compressed data across bins. We demonstrate the high, user-controlled accuracy of reconstructed data through several analytic scenarios, and the competitive performance of variable/temporal-constrained query processing, while incurring both a smaller memory and storage footprint compared to both raw data and popular scientific database systems. Finally, we discuss a number of HPC optimizations, such as parallel I/O and multi-node/multi-core query processing parallelized by temporal constraints, that allow extreme scale scalability.

Sriram Lakshminarasimhan - North Carolina State University

Jonathan Jenkins - North Carolina State University

Robert Latham - Argonne National Laboratory

Robert Ross - Argonne National Laboratory

Nagiza F. Samatova - Oak Ridge National Laboratory

Isha Arkatkar - North Carolina State University

Zhenhuan Gong - North Carolina State University

Hemanth Kolla - Sandia National Laboratories

Jackie Chen - Sandia National Laboratories

Seung-Hoe Ku - New York University

C.S. Chang - New York University

Stephane Ethier - Princeton Plasma Physics Laboratory

Scott Klasky - Oak Ridge National Laboratory

