Robustness of Retrieval-Augmented 3B Models in Domain-Specific QA

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20652857

Published June 12, 2026 | Version v1

Report Open

Robustness of Retrieval-Augmented 3B Models in Domain-Specific QA

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

As the legal community increasingly examines the use of large language models (LLMs) for various legal applications, legal AI developers have turned to retrieval-augmented LLMs ("RAG" systems) to improve system performance and robustness. An obstacle to the development of specialized RAG systems is the lack of realistic legal RAG benchmarks which capture the complexity of both legal retrieval and downstream legal question-answering. To address this, we introduce two novel legal RAG benchmarks: Bar Exam QA and Housing Statute QA. Our tasks correspond to real-world legal research tasks, and were

Research goal: What is the impact of varying retrieval system configurations (e.g., dense vs. sparse retrieval) on the robustness of retrieval-augmented 3B models in domain-specific QA tasks, evaluated using metrics like answer precision and distractor rejection rates?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.6/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.6/10.

Files

paper.pdf

Files (88.2 kB)

Name	Size	Download all
paper.pdf md5:4b3825a172f72a8853c120a4cb6f7c58	88.2 kB	Preview Download

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Robustness of Retrieval-Augmented 3B Models in Domain-Specific QA

Authors/Creators

Description

Notes

Files

paper.pdf

Files (88.2 kB)