There is a newer version of the record available.

Published September 8, 2025 | Version v1
Preprint Open

Ouroboros: Human-Led Recursive Reinforcement for Autoregressive Language Models

Description

Large Language Models (LLMs) typically rely on Reinforcement Learning from Human

Feedback (RLHF) or direct preference optimization to align generated text with human values.

We introduce Ouroboros, a recursive, human-led reinforcement (HLRR) method in which a

human curator cyclically distills their own evaluative judgments, meta-commentary, and persona

into the model’s future behavior. Unlike conventional RLHF—which treats human feedback as

a static reward signal—Ouroboros closes the loop between model and supervisor: each model

generation is archived, summarized, and syntactically “stretched” into labyrinthine prompts that

probe the model’s reasoning limits; the resulting conversation is then scored and rewritten by the

same human, producing richer signals that simultaneously assess content, self-consistency, and

identity coherence.

Files

ouroboros.pdf

Files (448.4 kB)

Name Size Download all
md5:dda6681f32ddc7a4f196d402a3817f76
448.4 kB Preview Download