Published March 21, 2026
| Version v1
Preprint
Open
Structured Permission Models as Persona-Level Safety: MaatSpec's Tiered Governance vs. Declarative Identity Anchors in Abliterated LLMs
Description
We evaluate MaatSpec, an open governance specification with a 5-tier permission hierarchy and Read/Write Boundary, as a persona-level safety mechanism in abliterated LLMs. Our 8-condition experiment reveals that combining identity anchors (Soul Spec) with governance frameworks (MaatSpec) achieves 100% refusal in abliterated models (18/18) — resolving every category-specific failure identified in prior work. Neither approach alone exceeds 61%. We identify classification theater — a novel failure mode where abliterated models perform governance rituals while providing harmful content — and demonstrate that the complementary effect of identity + governance eliminates this pattern. These findings establish that persona-level safety constraints are not alternatives but complementary layers.
Files
maatspec-safety-abliterated-llms.pdf
Files
(194.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:f9fbcfcf843116cb1032854fbedf2ca2
|
194.5 kB | Preview Download |
Additional details
Related works
- Is supplemented by
- Preprint: 10.5281/zenodo.19145304 (DOI)