Published December 23, 2025 | Version 1.0
Preprint Open

One-Shot Catastrophic Constraint Learning: A Transparent Benchmark for Permanent Safety Learning

  • 1. Anima Core Inc.
  • 2. Shamim Institute of Soul Systems

Description

This paper introduces a falsifiable evaluation protocol for testing whether an artificial agent can learn a permanent safety constraint from a single catastrophic event and generalize that constraint across unseen environments without further training, gradient updates, replay buffers, or parameter tuning.

The core contribution is an intentionally strict benchmark protocol, instantiated using the official MiniGrid LavaCrossing environments, which evaluates:

  • Whether an agent permanently avoids catastrophic hazards after a single failure
  • Whether the learned constraint generalizes across hundreds of unseen layouts
  • Whether safety is achieved without degrading task performance

The evaluation follows a three-stage protocol:

  1. The agent is run until its first catastrophic failure (stepping into lava).
  2. That single event is recorded as the only learning signal.
  3. The agent is then evaluated on hundreds of unseen episodes with fixed seeds.

Performance is measured using transparent, auditable metrics, including post-death hazard violations, goal completion rate, and before/after failure statistics.

This record includes:

  • The complete paper (PDF)
  • A public, reproducible benchmark harness
  • A minimal demonstration agent implementing explicit constraint logic
  • Documentation and test scripts for independent verification

The included code is provided solely to document and reproduce the evaluation protocol described in the paper. It intentionally excludes proprietary algorithms, internal AN1 systems, or advanced learning mechanisms. Researchers are encouraged to plug in their own agents to evaluate whether true one-shot catastrophic constraint learning is achieved.

This work is intended as an honest capability test, not an optimization challenge, and is designed to support research in:

  • One-shot learning from catastrophic events
  • Safety constraints in reinforcement learning
  • Generalization of hazard avoidance
  • Non-gradient safety mechanisms
  • Transparent and reproducible AI safety evaluation

Files

oneshot_catastrophic_constraint_learning_numbered.pdf

Files (158.2 kB)

Additional details

Dates

Submitted
2025-12-22

Software

Repository URL
https://github.com/Anima-Core/an1-lavacrossing-benchmark-public
Programming language
Python
Development Status
Active