Distributed Affix-Based Metadata Search in Self-Describing Data Files
Creators
Description
As the volume of scientific data continues to grow, the need for efficient metadata search mechanisms becomes increasingly critical. Self-describing data formats like HDF5 are central to managing and storing this data, yet traditional metadata search methods often fall short, especially for affix-based queries such as prefix, suffix, and infix searches. In this talk, I will review the advancements made with DART (Distributed Adaptive Radix Tree) and IDIOMS (Index-powered Distributed Object-centric Metadata Search), two powerful solutions designed to address the challenges of distributed affix-based metadata searches in high-performance computing environments.
DART introduces a scalable, trie-based indexing approach that significantly improves search performance and load balancing across distributed systems. Building on this foundation, IDIOMS further optimizes metadata searches by integrating a distributed in-memory trie-based index and supporting both independent and collective query modes. Together, these systems demonstrate substantial performance improvements over traditional methods, making them highly effective for managing large-scale scientific data.
By leveraging the principles and methodologies behind DART and IDIOMS, we can envision a robust solution for affix-based metadata searches in distributed HDF5 environments. This talk will provide a comprehensive overview of the existing work, highlight key technical innovations, and discuss the potential for extending these techniques to support efficient, scalable metadata searches in self-describing data formats.
Files
Files
(10.3 MB)
Name | Size | Download all |
---|---|---|
md5:d77f6d8c3229edf902bed4bc3112bc29
|
10.3 MB | Download |