Retrieval-augmented reasoning with lean language models

Chan, Ryan Sze-Yin; Nanni, Federico; Lazauskas, Tomas; Wood, Rosie; Yong, Penelope; Tarassenko, Lionel; Girolami, Mark; Geddes, James; Duncan, Andrew

doi:10.5281/zenodo.16408412

Published July 2025 | Version v1

Technical note Open

Retrieval-augmented reasoning with lean language models

1. The Alan Turing Institute
2. University of Oxford
3. University of Cambridge
4. Imperial College London

This technical report details a novel approach to combining reasoning and retrieval augmented generation (RAG) within a single, lean language model architecture. While existing RAG systems typically rely on large-scale models and external APIs, our work addresses the increasing demand for performant and privacy-preserving solutions deployable in resource-constrained or secure environments.
Building on recent developments in test-time scaling and small-scale reasoning models, we develop a retrieval augmented conversational agent capable of interpreting complex, domain-specific queries using a lightweight backbone model. Our system integrates a dense retriever with fine-tuned Qwen2.5-Instruct models, using synthetic query generation and reasoning traces derived from frontier models (e.g., DeepSeek-R1) over a curated corpus—in this case, the NHS A-to-Z condition pages.
We explore the impact of summarisation-based document compression, synthetic data design, and reasoning-aware }ne-tuning on model performance. Evaluation against both non-reasoning and general-purpose lean models demonstrates that our domain-specific fine-tuning approach yields substantial gains in answer accuracy and consistency, approaching frontier-level performance while remaining feasible for local deployment. All implementation details and code are publicly released to support reproducibility and adaptation across domains.

Files

chan-et-al-2025.pdf

Files (1.2 MB)

Name	Size	Download all
chan-et-al-2025.pdf md5:c7aea6ebfb269697ac376cedce11a965	1.2 MB	Preview Download

Additional details

UK Research and Innovation
Baskerville: a national accelerated compute resource EP/T022221/1
UK Research and Innovation
Baskerville 2.0: Enhanced Provision for High End and On-Demand Users EP/W032244/1

Catalog number: Turing Technical Report No. 8

	All versions	This version
Views	868	868
Downloads	905	905
Data volume	1.4 GB	1.4 GB

chan-et-al-2025.pdf

Files (1.2 MB)

Funding

Biodiversity

Retrieval-augmented reasoning with lean language models

Authors/Creators

Description

Files

chan-et-al-2025.pdf

Files (1.2 MB)

Additional details

Funding

Biodiversity