Robustness of Retrieval-Augmented 3B Models in Domain-Specific QA
Description
As the legal community increasingly examines the use of large language models (LLMs) for various legal applications, legal AI developers have turned to retrieval-augmented LLMs ("RAG" systems) to improve system performance and robustness. An obstacle to the development of specialized RAG systems is the lack of realistic legal RAG benchmarks which capture the complexity of both legal retrieval and downstream legal question-answering. To address this, we introduce two novel legal RAG benchmarks: Bar Exam QA and Housing Statute QA. Our tasks correspond to real-world legal research tasks, and were
Research goal: What is the impact of varying retrieval system configurations (e.g., dense vs. sparse retrieval) on the robustness of retrieval-augmented 3B models in domain-specific QA tasks, evaluated using metrics like answer precision and distractor rejection rates?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.6/10.
Notes
Files
paper.pdf
Files
(88.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:4b3825a172f72a8853c120a4cb6f7c58
|
88.2 kB | Preview Download |