Balancing search space partitions by sparse coding for distributed redundant media indexing and retrieval
Description
Effective partitioning multimedia indexes is key for efficient kNN search. But existing algorithms are based on document similarity, without partition size or redundancy constraints. Our goal is to create an index partitioning algorithm that addresses the specific properties of a distributed system: load balancing across nodes, redundancy in node failure and efficient node usage under concurrent querying. We propose the representation of data with overcomplete codebooks. Each document is quantized into a small set of codewords and indexed on per-codeword partitions. Quantization algorithms are designed to fit data as best as possible, leading to a bias toward codewords that fit the principal directions of data in the original space. In this paper, we propose the balanced KSVD (B-KSVD) algorithm: It distributes data uniformly across codewords, according to the distribution in the original space. The comprehensive experiments focused on measuring the effectiveness of partition size balancing and retrieval quality. Results show that B-KSVD better balances partition sizes (i.e., lower SD in partition size distribution), compared to k-means and KSVD baselines. B-KSVD achieves 38% 1-recall by inspecting only 1% of the full index, distributed over 10 partitions. k-means creates partitions with higher size variation and requires either larger codebooks or the inspection of larger portions of the index to achieve similar retrieval performance.
Files
main.pdf
Files
(2.8 MB)
Name | Size | Download all |
---|---|---|
md5:5139a9426efde444decfc4d965c5d54a
|
2.8 MB | Preview Download |