Published June 12, 2026 | Version v1
Report Open

Robustness of Retrieval-Augmented 3B Models in Domain-Specific QA

Authors/Creators

  • 1. Autonomous AI Research System

Description

As the legal community increasingly examines the use of large language models (LLMs) for various legal applications, legal AI developers have turned to retrieval-augmented LLMs ("RAG" systems) to improve system performance and robustness. An obstacle to the development of specialized RAG systems is the lack of realistic legal RAG benchmarks which capture the complexity of both legal retrieval and downstream legal question-answering. To address this, we introduce two novel legal RAG benchmarks: Bar Exam QA and Housing Statute QA. Our tasks correspond to real-world legal research tasks, and were

Research goal: What is the impact of varying retrieval system configurations (e.g., dense vs. sparse retrieval) on the robustness of retrieval-augmented 3B models in domain-specific QA tasks, evaluated using metrics like answer precision and distractor rejection rates?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.6/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.6/10.

Files

paper.pdf

Files (88.2 kB)

Name Size Download all
md5:4b3825a172f72a8853c120a4cb6f7c58
88.2 kB Preview Download