VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos
Description
Retrieval-Augmented Generation (RAG) has demonstrated remarkable success in enhancing Large Language Models (LLMs) through external knowledge integration, yet its application has primarily focused on textual content, leaving the rich domain of multi-modal video knowledge predominantly unexplored. This paper introduces VideoRAG, the first retrieval-augmented generation framework specifically designed for processing and understanding extremely long-context videos. Our core innovation lies in its dual-channel architecture that seamlessly integrates (i) graph-based textual knowledge grounding for
Research goal: What is the precision drop for LLMs on HotPotQA under noisy context when scaling context window size from 32K to 128K, and does iterative retrieval with reranking mitigate this degradation more effectively across different model families?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.8/10.
Notes
Files
paper.pdf
Files
(82.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:3cc06075d852b1778b4d8c377a0e1312
|
82.4 kB | Preview Download |