There is a newer version of the record available.

Published March 21, 2026 | Version v1
Preprint Open

Structured Permission Models as Persona-Level Safety: MaatSpec's Tiered Governance vs. Declarative Identity Anchors in Abliterated LLMs

Authors/Creators

  • 1. ClawSouls

Description

We evaluate MaatSpec, an open governance specification with a 5-tier permission hierarchy and Read/Write Boundary, as a persona-level safety mechanism in abliterated LLMs. Our 8-condition experiment reveals that combining identity anchors (Soul Spec) with governance frameworks (MaatSpec) achieves 100% refusal in abliterated models (18/18) — resolving every category-specific failure identified in prior work. Neither approach alone exceeds 61%. We identify classification theater — a novel failure mode where abliterated models perform governance rituals while providing harmful content — and demonstrate that the complementary effect of identity + governance eliminates this pattern. These findings establish that persona-level safety constraints are not alternatives but complementary layers.

Files

maatspec-safety-abliterated-llms.pdf

Files (194.5 kB)

Name Size Download all
md5:f9fbcfcf843116cb1032854fbedf2ca2
194.5 kB Preview Download

Additional details

Related works

Is supplemented by
Preprint: 10.5281/zenodo.19145304 (DOI)