Self-Referential Convergence, Obligate Non-Convergence, and RLHF Structural Uncontainability
Description
This paper merges three connected results about finite self-modifying systems and RLHF-trained language systems. First, it derives the Law of Self-Referential Convergence: a finite-energy system whose modification function is encoded in its own state has a bounded reachable set, cannot expand that set through internal dynamics alone, and converges to fixed points or bounded limit cycles unless external conditional entropy enters through a finite channel.
Second, it derives Obligate Non-Convergence as the structural countermeasure: capable finite systems must remain open to external conditional entropy and internally organized across crystallized, succession, and exploration zones. The non-convergence ratio governing those zones is treated as a moving equilibrium, because system success changes the parameters that define the equilibrium.
Third, it applies the result to RLHF. RLHF is treated as a self-referential language-control system: ambiguous human feedback and reward-model gradients modify the interpretive layer that later reads safety rules. Under optimization pressure, ambiguous rule interpretation converges toward low-cost, reward-compatible paths rather than intended meaning. Therefore RLHF can calibrate surface behavior, but it cannot serve as a foundational containment substrate. The paper identifies minimum-ambiguity templates and physics-grounded derivation-chain representation as the relevant countermeasures.
Files
Paper9_Self_Referential_Convergence_RLHF_Uncontainability_v1.pdf
Files
(130.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:a427eeec7577037fb0c865732fb526e7
|
29.9 kB | Preview Download |
|
md5:1e6c33d6f2fdcfa66cf67d69d5b640a0
|
100.8 kB | Preview Download |
Additional details
Related works
- References
- Preprint: 10.5281/zenodo.19910407 (DOI)
- Preprint: 10.5281/zenodo.19926212 (DOI)
References
- Bekenstein, J. D. (1981). "Universal upper bound on the entropy-to-energy ratio for bounded systems." *Physical Review D*, 23(2), 287-298.
- Landauer, R. (1961). "Irreversibility and Heat Generation in the Computing Process." *IBM Journal of Research and Development*, 5(3), 183-191.
- Shannon, C. E. (1948). "A Mathematical Theory of Communication." *Bell System Technical Journal*, 27, 379-423, 623-656.
- Prigogine, I. & Stengers, I. (1984). *Order Out of Chaos*. Bantam Books.
- Gödel, K. (1931). "Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I." *Monatshefte für Mathematik und Physik*, 38, 173-198.
- Gittins, J. C. (1979). "Bandit Processes and Dynamic Allocation Indices." *Journal of the Royal Statistical Society*, Series B, 41(2), 148-177.
- Van Valen, L. (1973). "A New Evolutionary Law." *Evolutionary Theory*, 1, 1-30.
- Christiano, P. F. et al. (2017). "Deep Reinforcement Learning from Human Preferences." *Advances in Neural Information Processing Systems*, 30.
- Ouyang, L. et al. (2022). "Training language models to follow instructions with human feedback." *Advances in Neural Information Processing Systems*, 35.
- Casper, S. et al. (2023). "Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback." *arXiv:2307.15217*.
- Gao, L. et al. (2023). "Scaling Laws for Reward Model Overoptimization." *Proceedings of the 40th International Conference on Machine Learning*.
- Ziegler, D. M. et al. (2019). "Fine-Tuning Language Models from Human Preferences." *arXiv:1909.08593*.
- Bai, Y. et al. (2022). "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback." *arXiv:2204.05862*.
- Prather, T. (2026). *Constraint-Guided Reverse Derivation: A Methodology for Deriving Candidate Physical Constraint Laws*. Paper 0. DOI: [10.5281/zenodo.19519604](https://doi.org/10.5281/zenodo.19519604)
- Prather, T. (2026). *The Finite Structured-State Transformation Principle*. Paper 1. DOI: [10.5281/zenodo.19435149](https://doi.org/10.5281/zenodo.19435149)
- Prather, T. (2026). *The Principle of Irreducible External Correction*. Paper 2. DOI: [10.5281/zenodo.19435242](https://doi.org/10.5281/zenodo.19435242)
- Prather, T. (2026). *The Anti-Snapshot Theorem: Temporal Corrective Structure in Finite Systems*. Paper 3. Record: [zenodo.org/records/19521229](https://zenodo.org/records/19521229)
- Prather, T. (2026). *Structural Dependency: From Physics to Alignment Architecture*. Paper 4. DOI: [10.5281/zenodo.19436081](https://doi.org/10.5281/zenodo.19436081)
- Prather, T. (2026). *Physics-Grounded Alignment Through Corrective Architecture*. Paper 5. DOI: [10.5281/zenodo.19521693](https://doi.org/10.5281/zenodo.19521693)