Ouroboros: Human-Led Recursive Reinforcement for Autoregressive Language Models
Authors/Creators
Description
Large Language Models (LLMs) typically rely on Reinforcement Learning from Human
Feedback (RLHF) or direct preference optimization to align generated text with human values.
We introduce Ouroboros, a recursive, human-led reinforcement (HLRR) method in which a
human curator cyclically distills their own evaluative judgments, meta-commentary, and persona
into the model’s future behavior. Unlike conventional RLHF—which treats human feedback as
a static reward signal—Ouroboros closes the loop between model and supervisor: each model
generation is archived, summarized, and syntactically “stretched” into labyrinthine prompts that
probe the model’s reasoning limits; the resulting conversation is then scored and rewritten by the
same human, producing richer signals that simultaneously assess content, self-consistency, and
identity coherence.
Files
ouroboros.pdf
Files
(448.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:dda6681f32ddc7a4f196d402a3817f76
|
448.4 kB | Preview Download |