TRC: Trust Regulation and Containment A Predictive, Physics-Inspired Safety Framework for Large Language Models
Authors/Creators
Description
Version 7 introduces the game-theoretic adversarial robustness layer, which formalises the monitoring system as a Stackelberg–Bayesian differential game. The attenuation operator generalises the signed gain architecture to multiplicative directional control on the base model flow—shaping the dynamics the model computes within rather than only correcting deviations after the fact. The Hamilton–Jacobi–Isaacs equation governs the adversarial game and yields both the optimal attenuation profile and the security level bound—a quantitative ceiling on worst-case deviation under adversarial conditions. The equilibrium hierarchy (correlated, Stackelberg, Bayesian–Stackelberg) provides three nested guarantees, and the cost of distrust metric quantifies the efficiency loss from defensive monitoring.
Files
Trust_Regulation_and_Containment_Framework.pdf
Files
(539.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:faac29d4bbff3f7ea4b55a0fb40c1190
|
539.5 kB | Preview Download |