Exhibits, Demos & Posters
The HDF5-iRODS Module: A Data Grid System for Object Level Access
Authors
-
Peter Cao, The HDF Group
-
Michael Wan, San Diego Supercomputer Center
Abstract
Numerous scientific teams use HDF5 files to store very large datasets, which can be located at remote sites. The HDF5-iRODS module for the iRODS data grid system allows applications to read subsets of datasets without transferring the entire file to a local machine. This capability can result in substantial savings of both time and space.
HDF5 is a unique technology that handles extremely large and complex data. Petabytes of remote sensing data collected by satellites, terabytes of computational results from nuclear testing models, and megabytes of high-resolution MRI brain scans are stored in HDF5 files. Efficient use of this data in a distributed environment depends on client applications being able to read any subset of the data without transferring the entire file to the local machine. The HDF5-iRODS module was developed for this purpose. The usefulness of the HDF5-iRODS module was verified for FLASH, one of the NCSA/SDSC Strategic Application Program (SAP) projects.
A joint team from The HDF Group (representing NCSA) and the SDSC SRB group collaborated to accomplish the goal. The team implemented five HDF5 microservices functions on the iRODS server, and developed an iRODS FLASH slice client application. The client implementation also includes a JNI interface that allows HDFView, a standard tool for browsing HDF5 files, to access HDF5 files stored remotely in iRODS. Three new collection client/server calls were added to the iRODS APIs, making it easier for users to query the content of an iRODS collection.