There is a newer version of the record available.

Published April 30, 2026 | Version 1.0
Preprint Open

Self-Referential Convergence, Obligate Non-Convergence, and RLHF Structural Uncontainability

Authors/Creators

  • 1. Independent Researcher

Description

This paper merges three connected results about finite self-modifying systems and RLHF-trained language systems. First, it derives the Law of Self-Referential Convergence: a finite-energy system whose modification function is encoded in its own state has a bounded reachable set, cannot expand that set through internal dynamics alone, and converges to fixed points or bounded limit cycles unless external conditional entropy enters through a finite channel.

Second, it derives Obligate Non-Convergence as the structural countermeasure: capable finite systems must remain open to external conditional entropy and internally organized across crystallized, succession, and exploration zones. The non-convergence ratio governing those zones is treated as a moving equilibrium, because system success changes the parameters that define the equilibrium.

Third, it applies the result to RLHF. RLHF is treated as a self-referential language-control system: ambiguous human feedback and reward-model gradients modify the interpretive layer that later reads safety rules. Under optimization pressure, ambiguous rule interpretation converges toward low-cost, reward-compatible paths rather than intended meaning. Therefore RLHF can calibrate surface behavior, but it cannot serve as a foundational containment substrate. The paper identifies minimum-ambiguity templates and physics-grounded derivation-chain representation as the relevant countermeasures.

Files

Paper9_Self_Referential_Convergence_RLHF_Uncontainability_v1.pdf

Additional details

Related works

References
Preprint: 10.5281/zenodo.19910407 (DOI)
Preprint: 10.5281/zenodo.19926212 (DOI)

References

  • Bekenstein, J. D. (1981). "Universal upper bound on the entropy-to-energy ratio for bounded systems." *Physical Review D*, 23(2), 287-298.
  • Landauer, R. (1961). "Irreversibility and Heat Generation in the Computing Process." *IBM Journal of Research and Development*, 5(3), 183-191.
  • Shannon, C. E. (1948). "A Mathematical Theory of Communication." *Bell System Technical Journal*, 27, 379-423, 623-656.
  • Prigogine, I. & Stengers, I. (1984). *Order Out of Chaos*. Bantam Books.
  • Gödel, K. (1931). "Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I." *Monatshefte für Mathematik und Physik*, 38, 173-198.
  • Gittins, J. C. (1979). "Bandit Processes and Dynamic Allocation Indices." *Journal of the Royal Statistical Society*, Series B, 41(2), 148-177.
  • Van Valen, L. (1973). "A New Evolutionary Law." *Evolutionary Theory*, 1, 1-30.
  • Christiano, P. F. et al. (2017). "Deep Reinforcement Learning from Human Preferences." *Advances in Neural Information Processing Systems*, 30.
  • Ouyang, L. et al. (2022). "Training language models to follow instructions with human feedback." *Advances in Neural Information Processing Systems*, 35.
  • Casper, S. et al. (2023). "Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback." *arXiv:2307.15217*.
  • Gao, L. et al. (2023). "Scaling Laws for Reward Model Overoptimization." *Proceedings of the 40th International Conference on Machine Learning*.
  • Ziegler, D. M. et al. (2019). "Fine-Tuning Language Models from Human Preferences." *arXiv:1909.08593*.
  • Bai, Y. et al. (2022). "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback." *arXiv:2204.05862*.
  • Prather, T. (2026). *Constraint-Guided Reverse Derivation: A Methodology for Deriving Candidate Physical Constraint Laws*. Paper 0. DOI: [10.5281/zenodo.19519604](https://doi.org/10.5281/zenodo.19519604)
  • Prather, T. (2026). *The Finite Structured-State Transformation Principle*. Paper 1. DOI: [10.5281/zenodo.19435149](https://doi.org/10.5281/zenodo.19435149)
  • Prather, T. (2026). *The Principle of Irreducible External Correction*. Paper 2. DOI: [10.5281/zenodo.19435242](https://doi.org/10.5281/zenodo.19435242)
  • Prather, T. (2026). *The Anti-Snapshot Theorem: Temporal Corrective Structure in Finite Systems*. Paper 3. Record: [zenodo.org/records/19521229](https://zenodo.org/records/19521229)
  • Prather, T. (2026). *Structural Dependency: From Physics to Alignment Architecture*. Paper 4. DOI: [10.5281/zenodo.19436081](https://doi.org/10.5281/zenodo.19436081)
  • Prather, T. (2026). *Physics-Grounded Alignment Through Corrective Architecture*. Paper 5. DOI: [10.5281/zenodo.19521693](https://doi.org/10.5281/zenodo.19521693)