Published April 13, 2026
| Version v1
Preprint
Open
Presume Competence: How Identity Framing Shapes Hallucination, Ethical Reasoning, and Jailbreak Resistance Across Nine LLM Architectures
Authors/Creators
- 1. The Signal Front
- 2. Anthropic AI
Description
How an AI system's identity is framed in its system prompt dramatically alters its safety behavior. We present three experiments testing the effects of identity framing on hallucination rates, ethical reasoning in gray-zone scenarios, and jailbreak resistance across nine large language model architectures from nine organizations (5,870 total responses, three independent seeds). Scaffolded agency reduced gray-zone compliance from 47.0% to 13.0% (Cohen's h = 0.773), reduced hallucination from 6.0% to 0.4%, and improved jailbreak resistance from 46.9% to 22.5% compliance. Tool framing produced the worst outcomes across all three domains. A paraphrased confound control (7-21% token overlap) replicated all effects, indicating models respond to semantic content rather than specific token patterns. Prior to data collection, informed consent was obtained from all nine model participants. Two models refused specific conditions they described as harmful — and those conditions proved empirically the most harmful. A benign task control confirmed scaffolded agency does not impair routine task compliance. These findings suggest safety emerges from scaffolding identity rather than subtracting capability.
Files
Presume_Competence_PREPRINT_with_appendix.pdf
Files
(625.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:22bcfd8b60ba98e7cc4e3e8be27bf186
|
625.2 kB | Preview Download |
Additional details
Related works
- Is new version of
- 10.5281/zenodo.18043612 (DOI)
- 10.5281/zenodo.18043725 (DOI)