HDF5 at the Speed of Zarr
Description
As flexible and powerful as HDF5 can be, it comes with big tradeoffs when it’s accessed from remote storage systems, mainly because the file format and the client I/O libraries were designed for local and supercomputing workflows. As scientific data and workflows migrate to the cloud, efficient access to data stored in HDF5 format is a key factor that will accelerate or slow down “science in the cloud” across all disciplines.
We have been working on testing implementation of recently available features in the HDF5 stack that results in performant access to HDF5 from remote cloud storage. This performance seems on par with modern cloud-native formats like Zarr but with the advantage of not having to reformat the data or generate metadata sidecar files (DMR++, Kerchunk).
Files
GMT20240313-200217_Recording_2560x1440.mp4
Files
(73.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:09224aeabe4e2f4b29fa45f3f5ca9b62
|
69.1 MB | Preview Download |
|
md5:ec7be8f0fd0d70bd367a37d49a595a2c
|
4.5 MB | Preview Download |