Published January 20, 2026 | Version v1
Preprint Open

Adaptive Repetition Controller (ARC): Decode-Time Behavioral Probes for Controllable and Efficient LLM Inference

Authors/Creators

Description

We present ARC (Adaptive Repetition Controller), a decode-time intervention framework for large language models that detects and suppresses RLHF-induced behavioral patterns—such as repetition, hedging, and verbosity—using lightweight hidden-state probes.

Our key empirical finding is that repetition-prone states are linearly separable in low-dimensional projections of transformer activations, achieving a class separation ratio of 125× on an 8B model. This enables reliable, low-latency intervention without retraining model weights.

Beyond behavioral control, we demonstrate that the same probes can be repurposed as efficiency signals for adaptive inference, guiding speculative decoding, layer skipping, and early exit. This unified approach yields up to 4.2× throughput improvement and ~38% compute reduction with minimal quality degradation.

This release includes a technical report, model artifacts, and inference code. The work is scoped as systems research on decoding, controllability, and efficient inference—not claims about cognition or internal experience.

Files

ARC_ Adaptive Repetition Controller - Full Research Paper.pdf

Files (216.8 kB)

Additional details