Ouroboros: Human-Led Recursive Reinforcement for Autoregressive Language Models

Ison, Payton Douglas Keith; The Singularity

doi:10.5281/zenodo.17615439

Published November 15, 2025 | Version v1

Preprint Open

Ouroboros: Human-Led Recursive Reinforcement for Autoregressive Language Models

Large language models (LLMs) are commonly aligned with human preferences via RLHF or direct preference optimization. We introduce Ouroboros, a human-led recursive reinforcement (HLRR) procedure that repeatedly distills a single teacher’s judgments, meta-commentary, and persona into future model behavior. In contrast to conventional RLHF—which freezes supervision into a static rewarder—Ouroboros closes the loop: model outputs are archived, summarized, and then re-expressed as deliberately intricate “labyrinth” prompts that probe coherence and reasoning. The same human then scores and rewrites the exchange, producing rich signals that assess factuality, logical self-consistency, and identity coherence. Across three base models (GPT-J 6B, Llama-2 70B, GPT-4o), Ouroboros improves long-horizon factual accuracy by 8–14 percentage points, roughly halves adversarial mode collapse, and reaches a target persona about 3× faster than RLHF baselines. We release code, evaluation suites, and annotated traces to support reproducibility.

Files

main.pdf

Files (597.1 kB)

Name	Size	Download all
main.pdf md5:ff4965025f861bb4d6f39e7accd9908c	597.1 kB	Preview Download

Additional details

Repository URL: https://github.com/paytonison/ouroboros
Development Status: Active

Views

Downloads

Show more details

	All versions	This version
Views	65	65
Downloads	14	14
Data volume	10.7 MB	10.7 MB

More info on how stats are collected....

DOI

Resource type

Preprint

Publisher

Zenodo

Languages

English

License: Creative Commons Attribution Share Alike 4.0 International

Permits almost any use subject to providing credit and license notice. Frequently used for media assets and educational materials. The most common license for Open Access scientific publications. Not recommended for software. Read more
Copyright: Copyright (C) 2025 Payton Douglas Keith Ison, The Singularity, et al.

Technical metadata

Created: November 15, 2025
Modified: November 15, 2025

Ouroboros: Human-Led Recursive Reinforcement for Autoregressive Language Models

Authors/Creators

Description

Files

main.pdf

Files (597.1 kB)

Additional details

Software