Published March 13, 2024 | Version v1
Presentation Open

HDF5 at the Speed of Zarr

  • 1. National Snow and Ice Data Center (NSIDC)

Description

As flexible and powerful as HDF5 can be, it comes with big tradeoffs when it’s accessed from remote storage systems, mainly because the file format and the client I/O libraries were designed for local and supercomputing workflows. As scientific data and workflows migrate to the cloud, efficient access to data stored in HDF5 format is a key factor that will accelerate or slow down “science in the cloud” across all disciplines.

We have been working on testing  implementation of recently available features in the HDF5 stack that results in performant access to HDF5 from remote cloud storage. This performance seems on par with modern cloud-native formats like Zarr but with the advantage of not having to reformat the data or generate metadata sidecar files (DMR++, Kerchunk).

Files

GMT20240313-200217_Recording_2560x1440.mp4

Files (73.7 MB)

Name Size Download all
md5:09224aeabe4e2f4b29fa45f3f5ca9b62
69.1 MB Preview Download
md5:ec7be8f0fd0d70bd367a37d49a595a2c
4.5 MB Preview Download