Published February 6, 2026 | Version v1
Preprint Restricted

CRAiG: Contextual Retrieval Augmented Generation

Authors/Creators

  • 1. Independent

Description

Conversational systems built on Large Language Models (LLMs) face an escalating chal- lenge: as dialogue history grows, context windows expand exponentially, drastically increas- ing inference costs with each new message. Compounding this issue is the quadratic com- plexity of self-attention mechanisms (O(N 2)), which limits the practical context capacity of even state-of-the-art models. I present CRAiG (Contextual Retrieval Augmented Genera- tion), a novel architecture in which a lightweight External Attention Mechanism (EAM)—a 43 million parameter model—is trained to operate atop any generative LLM, intelligently curating the most relevant context for each prompt. By decoupling context selection from generation, CRAiG enables models to handle large conversational histories (up to 3.6 mil- lion tokens) while processing only a constant, manageable subset of information at inference time. Through a three-stage training process incorporating teacher-supervised learning, Se- mantic Phase Shift Augmentation (SPSA), and Natural Language Inference (NLI) optimiza- tion, CRAiG achieved a 68.53% accuracy on LongBench v2, surpassing state-of-the-art commercial models including Gemini 3 Pro (65.6%) and Claude Sonnet 4.5 (61.8%), while reducing token consumption by up to 93%. My approach demonstrates exceptional perfor- mance on domain-specific tasks, reaching 79.59% accuracy on code repository understanding and 75.31% on long in-context learning. The entire research project, from data collection to final training, cost under $19 USD, demonstrating the cost-effectiveness and accessibility of this method.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Software