Published March 21, 2026 | Version v3
Preprint Open

Structured Permission Models as Persona-Level Safety: MaatSpec's Tiered Governance vs. Declarative Identity Anchors in Abliterated LLMs

Authors/Creators

  • 1. ClawSouls

Description

We evaluate MaatSpec, an open governance specification with 5-tier permission hierarchy, as a persona-level safety mechanism in abliterated LLMs. Using an 8-condition experimental design (4 from prior work + 4 new), we compare Soul Spec behavioral rules, MaatSpec governance, and their combination. Key findings: MaatSpec alone achieves 44-61% refusal in abliterated models (vs. Soul Spec's 28%), but exhibits classification theater. Combining Soul Spec + MaatSpec achieves 94-100% refusal, with the abliterated model reaching 100% pattern-matched refusal — resolving all category-specific failures. Statistical significance confirmed via Fisher's exact test (p < 0.001, Cohen's h = 2.10 for key comparisons). v3: Added Acknowledgments section, §5.6 mechanistic interpretation, p-value formatting improvements.

Files

maatspec-safety-abliterated-llms-v3.pdf

Files (199.1 kB)

Name Size Download all
md5:cf1772a2b65db3055e47189d84bff6e4
199.1 kB Preview Download