Distributed Affix-Based Metadata Search in Self-Describing Data Files
Description
As the volume of scientific data continues to grow, the need for efficient metadata search mechanisms becomes increasingly critical. Self-describing data formats like HDF5 are central to managing and storing this data, yet traditional metadata search methods often fall short, especially for affix-based queries such as prefix, suffix, and infix searches. In this talk, I will review the advancements made with DART (Distributed Adaptive Radix Tree) and IDIOMS (Index-powered Distributed Object-centric Metadata Search), two powerful solutions designed to address the challenges of distributed affix-based metadata searches in high-performance computing environments.
DART introduces a scalable, trie-based indexing approach that significantly improves search performance and load balancing across distributed systems. Building on this foundation, IDIOMS further optimizes metadata searches by integrating a distributed in-memory trie-based index and supporting both independent and collective query modes. Together, these systems demonstrate substantial performance improvements over traditional methods, making them highly effective for managing large-scale scientific data.
By leveraging the principles and methodologies behind DART and IDIOMS, we can envision a robust solution for affix-based metadata searches in distributed HDF5 environments. This talk will provide a comprehensive overview of the existing work, highlight key technical innovations, and discuss the potential for extending these techniques to support efficient, scalable metadata searches in self-describing data formats.
Files
IDIOMS HUG24-WeiZhang.pdf
Files
(1.2 MB)
Name | Size | Download all |
---|---|---|
md5:cbe9b263e252bdc2b029dffcea560ad3
|
1.2 MB | Preview Download |
Additional details
Related works
- Is supplemented by
- Video/Audio: https://youtu.be/i-Fw8Z9SXuE (URL)