Comparative Analysis of Semantic File Routing and Chunk-Based Retrieval on Homogeneous Financial Documents Using MSR-VTT
Description
Retrieval-Augmented Generation (RAG) systems for financial document question answering typically follow a chunk-based paradigm: documents are split into fragments, embedded into vector space, and retrieved via similarity search. While effective in general settings, this approach suffers from cross-document chunk confusion in structurally homogeneous corpora such as regulatory filings. Semantic File Routing (SFR), which uses LLM structured output to route queries to whole documents, reduces catastrophic failures but sacrifices the precision of targeted chunk retrieval. We identify this robustne
Research goal: How does the Semantic File Routing (SFR) method compare to traditional chunk-based retrieval in terms of precision and recall on the MSR-VTT benchmark when applied to structurally homogeneous financial documents?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.9/10.
Notes
Files
paper.pdf
Files
(80.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:cbf7d9aa6ec2efa1c1966ed812eb8990
|
80.1 kB | Preview Download |