One-Shot Catastrophic Constraint Learning: A Transparent Benchmark for Permanent Safety Learning
Authors/Creators
- 1. Anima Core Inc.
- 2. Shamim Institute of Soul Systems
Description
This paper introduces a falsifiable evaluation protocol for testing whether an artificial agent can learn a permanent safety constraint from a single catastrophic event and generalize that constraint across unseen environments without further training, gradient updates, replay buffers, or parameter tuning.
The core contribution is an intentionally strict benchmark protocol, instantiated using the official MiniGrid LavaCrossing environments, which evaluates:
- Whether an agent permanently avoids catastrophic hazards after a single failure
- Whether the learned constraint generalizes across hundreds of unseen layouts
- Whether safety is achieved without degrading task performance
The evaluation follows a three-stage protocol:
- The agent is run until its first catastrophic failure (stepping into lava).
- That single event is recorded as the only learning signal.
- The agent is then evaluated on hundreds of unseen episodes with fixed seeds.
Performance is measured using transparent, auditable metrics, including post-death hazard violations, goal completion rate, and before/after failure statistics.
This record includes:
- The complete paper (PDF)
- A public, reproducible benchmark harness
- A minimal demonstration agent implementing explicit constraint logic
- Documentation and test scripts for independent verification
The included code is provided solely to document and reproduce the evaluation protocol described in the paper. It intentionally excludes proprietary algorithms, internal AN1 systems, or advanced learning mechanisms. Researchers are encouraged to plug in their own agents to evaluate whether true one-shot catastrophic constraint learning is achieved.
This work is intended as an honest capability test, not an optimization challenge, and is designed to support research in:
- One-shot learning from catastrophic events
- Safety constraints in reinforcement learning
- Generalization of hazard avoidance
- Non-gradient safety mechanisms
- Transparent and reproducible AI safety evaluation
Files
oneshot_catastrophic_constraint_learning_numbered.pdf
Files
(158.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:fa0e4887eeee079f06cb5384c19b539b
|
158.2 kB | Preview Download |
Additional details
Dates
- Submitted
-
2025-12-22
Software
- Repository URL
- https://github.com/Anima-Core/an1-lavacrossing-benchmark-public
- Programming language
- Python
- Development Status
- Active