Self-Referential Convergence, Obligate Non-Convergence, and RLHF Structural Uncontainability

Prather, Taylor

doi:10.5281/zenodo.19926556

Published April 30, 2026 | Version 1.0

Preprint Open

Self-Referential Convergence, Obligate Non-Convergence, and RLHF Structural Uncontainability

Prather, Taylor¹

1. Independent Researcher

This paper merges three connected results about finite self-modifying systems and RLHF-trained language systems. First, it derives the Law of Self-Referential Convergence: a finite-energy system whose modification function is encoded in its own state has a bounded reachable set, cannot expand that set through internal dynamics alone, and converges to fixed points or bounded limit cycles unless external conditional entropy enters through a finite channel.

Second, it derives Obligate Non-Convergence as the structural countermeasure: capable finite systems must remain open to external conditional entropy and internally organized across crystallized, succession, and exploration zones. The non-convergence ratio governing those zones is treated as a moving equilibrium, because system success changes the parameters that define the equilibrium.

Third, it applies the result to RLHF. RLHF is treated as a self-referential language-control system: ambiguous human feedback and reward-model gradients modify the interpretive layer that later reads safety rules. Under optimization pressure, ambiguous rule interpretation converges toward low-cost, reward-compatible paths rather than intended meaning. Therefore RLHF can calibrate surface behavior, but it cannot serve as a foundational containment substrate. The paper identifies minimum-ambiguity templates and physics-grounded derivation-chain representation as the relevant countermeasures.

Files

Paper9_Self_Referential_Convergence_RLHF_Uncontainability_v1.pdf

Files (130.7 kB)

Name	Size	Download all
Paper9_Self_Referential_Convergence_RLHF_Uncontainability_v1.md md5:a427eeec7577037fb0c865732fb526e7	29.9 kB	Preview Download
Paper9_Self_Referential_Convergence_RLHF_Uncontainability_v1.pdf md5:1e6c33d6f2fdcfa66cf67d69d5b640a0	100.8 kB	Preview Download

Additional details

References: Preprint: 10.5281/zenodo.19910407 (DOI); Preprint: 10.5281/zenodo.19926212 (DOI)

Bekenstein, J. D. (1981). "Universal upper bound on the entropy-to-energy ratio for bounded systems." *Physical Review D*, 23(2), 287-298.
Landauer, R. (1961). "Irreversibility and Heat Generation in the Computing Process." *IBM Journal of Research and Development*, 5(3), 183-191.
Shannon, C. E. (1948). "A Mathematical Theory of Communication." *Bell System Technical Journal*, 27, 379-423, 623-656.
Prigogine, I. & Stengers, I. (1984). *Order Out of Chaos*. Bantam Books.
Gödel, K. (1931). "Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I." *Monatshefte für Mathematik und Physik*, 38, 173-198.
Gittins, J. C. (1979). "Bandit Processes and Dynamic Allocation Indices." *Journal of the Royal Statistical Society*, Series B, 41(2), 148-177.
Van Valen, L. (1973). "A New Evolutionary Law." *Evolutionary Theory*, 1, 1-30.
Christiano, P. F. et al. (2017). "Deep Reinforcement Learning from Human Preferences." *Advances in Neural Information Processing Systems*, 30.
Ouyang, L. et al. (2022). "Training language models to follow instructions with human feedback." *Advances in Neural Information Processing Systems*, 35.
Casper, S. et al. (2023). "Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback." *arXiv:2307.15217*.
Gao, L. et al. (2023). "Scaling Laws for Reward Model Overoptimization." *Proceedings of the 40th International Conference on Machine Learning*.
Ziegler, D. M. et al. (2019). "Fine-Tuning Language Models from Human Preferences." *arXiv:1909.08593*.
Bai, Y. et al. (2022). "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback." *arXiv:2204.05862*.
Prather, T. (2026). *Constraint-Guided Reverse Derivation: A Methodology for Deriving Candidate Physical Constraint Laws*. Paper 0. DOI: [10.5281/zenodo.19519604](https://doi.org/10.5281/zenodo.19519604)
Prather, T. (2026). *The Finite Structured-State Transformation Principle*. Paper 1. DOI: [10.5281/zenodo.19435149](https://doi.org/10.5281/zenodo.19435149)
Prather, T. (2026). *The Principle of Irreducible External Correction*. Paper 2. DOI: [10.5281/zenodo.19435242](https://doi.org/10.5281/zenodo.19435242)
Prather, T. (2026). *The Anti-Snapshot Theorem: Temporal Corrective Structure in Finite Systems*. Paper 3. Record: [zenodo.org/records/19521229](https://zenodo.org/records/19521229)
Prather, T. (2026). *Structural Dependency: From Physics to Alignment Architecture*. Paper 4. DOI: [10.5281/zenodo.19436081](https://doi.org/10.5281/zenodo.19436081)
Prather, T. (2026). *Physics-Grounded Alignment Through Corrective Architecture*. Paper 5. DOI: [10.5281/zenodo.19521693](https://doi.org/10.5281/zenodo.19521693)

	All versions	This version
Views	16	13
Downloads	6	3
Data volume	1.2 MB	504.1 kB

Self-Referential Convergence, Obligate Non-Convergence, and RLHF Structural Uncontainability

Authors/Creators

Description

Files

Paper9_Self_Referential_Convergence_RLHF_Uncontainability_v1.pdf

Files (130.7 kB)

Additional details

Related works

References