ARCUS-H: Behavioral Stability Under Controlled Stress as a Complementary RL Evaluation Axis
Authors/Creators
Description
Reinforcement learning agents are typically evaluated by expected episodic return, yet return can hide fragility: policies that appear strong in nominal conditions may degrade when constraints tighten, action semantics become unreliable, or reward feedback is adversarially inverted. We present ARCUS-H (Adaptive Reinforcement Coherence Under Stress Harness), an evaluation harness that measures behavioral stability under stress using five interpretable channels - competence, coherence, continuity, integrity, and meaning - combined into a composite stability score. A key calibration contribution is an adaptive per-run threshold derived from the pre-phase score distribution, which achieves a false positive rate of 2.0% (target α = 0.05) without any environment-specific tuning.
We benchmark 7 algorithms across 9 environments (6 classic control, 2 MuJoCo, 1 Atari) under 4 stress schedules (concept drift, resource constraint, trust violation, valence inversion) with 10 seeds each. Three findings stand out. First, reward and stability diverge substantially: Pearson r = +0.14, p = 0.364 between normalized return and collapse rate under valence inversion, confirming that return alone does not capture stress fragility. Second, stressor effects are environment-class-dependent: MuJoCo agents collapse at rates of 66–84% compared to 33–69% for classic control under the same stressors, despite MuJoCo agents achieving higher absolute reward. Third, each stressor leaves a distinct signature across stability channels, supporting the interpretability of per-channel diagnostics.
Files
arcus_h_paper.pdf
Files
(743.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:f11247f43a2c2aca326568111e4a162e
|
743.9 kB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/karimzn00/ARCUSH (URL)
Software
- Repository URL
- https://github.com/karimzn00/ARCUSH
- Programming language
- Python
- Development Status
- Active