Published June 12, 2026 | Version v1
Report Open

Comparative Analysis of Semantic File Routing and Chunk-Based Retrieval on Homogeneous Financial Documents Using MSR-VTT

Authors/Creators

  • 1. Autonomous AI Research System

Description

Retrieval-Augmented Generation (RAG) systems for financial document question answering typically follow a chunk-based paradigm: documents are split into fragments, embedded into vector space, and retrieved via similarity search. While effective in general settings, this approach suffers from cross-document chunk confusion in structurally homogeneous corpora such as regulatory filings. Semantic File Routing (SFR), which uses LLM structured output to route queries to whole documents, reduces catastrophic failures but sacrifices the precision of targeted chunk retrieval. We identify this robustne

Research goal: How does the Semantic File Routing (SFR) method compare to traditional chunk-based retrieval in terms of precision and recall on the MSR-VTT benchmark when applied to structurally homogeneous financial documents?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.9/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.9/10.

Files

paper.pdf

Files (80.1 kB)

Name Size Download all
md5:cbf7d9aa6ec2efa1c1966ed812eb8990
80.1 kB Preview Download