Published August 5, 2024 | Version v2
Presentation Open

Distributed Affix-Based Metadata Search in Self-Describing Data Files

Creators

  • 1. ROR icon Lawrence Berkeley National Laboratory

Description

As the volume of scientific data continues to grow, the need for efficient metadata search mechanisms becomes increasingly critical. Self-describing data formats like HDF5 are central to managing and storing this data, yet traditional metadata search methods often fall short, especially for affix-based queries such as prefix, suffix, and infix searches. In this talk, I will review the advancements made with DART (Distributed Adaptive Radix Tree) and IDIOMS (Index-powered Distributed Object-centric Metadata Search), two powerful solutions designed to address the challenges of distributed affix-based metadata searches in high-performance computing environments.

DART introduces a scalable, trie-based indexing approach that significantly improves search performance and load balancing across distributed systems. Building on this foundation, IDIOMS further optimizes metadata searches by integrating a distributed in-memory trie-based index and supporting both independent and collective query modes. Together, these systems demonstrate substantial performance improvements over traditional methods, making them highly effective for managing large-scale scientific data.

By leveraging the principles and methodologies behind DART and IDIOMS, we can envision a robust solution for affix-based metadata searches in distributed HDF5 environments. This talk will provide a comprehensive overview of the existing work, highlight key technical innovations, and discuss the potential for extending these techniques to support efficient, scalable metadata searches in self-describing data formats.

Files

IDIOMS HUG24-WeiZhang.pdf

Files (1.2 MB)

Name Size Download all
md5:cbe9b263e252bdc2b029dffcea560ad3
1.2 MB Preview Download

Additional details

Related works

Is supplemented by
Video/Audio: https://youtu.be/i-Fw8Z9SXuE (URL)