Published September 10, 2025 | Version v1
Preprint Open

Unbounded Ranking Capacity with Combinatorially-Expressive Retrieval

  • 1. ROR icon University of the Cumberlands

Description

Combinatorially-Expressive Retrieval (CER) is a three-stage hybrid IR system—BM25 for high-recall lexical matching, ColBERTv2 for late-interaction semantic reranking, and a cross-encoder for final judgment—combined via monotonic linear score fusion to preserve consensus orderings. The paper argues this design sidesteps the rank/sign-rank limits that cap single-vector dense retrievers, effectively yielding unbounded ranking capacity in theory and robust performance in practice. On the LIMIT benchmark, where strong dense models collapse (~10–15% Recall@10), CER reaches 97.4% Recall@100 and 96.4% Recall@2, while an optimized setup achieves ~0.37 s/query on a single Apple M4 Max—suggesting high accuracy without heavy infrastructure. The approach reframes retrieval: architectural hybridity, not ever-larger embeddings, is key for combinatorial queries and next-gen RAG.

Files

Akbar Unbounded Ranking Capacity with CER.pdf

Files (7.9 MB)

Name Size Download all
md5:7eb18f51ca1f10deb39eabd4071747fd
7.9 MB Preview Download

Additional details

Dates

Accepted
2025-09-09