Published June 6, 2026 | Version v1
Preprint Open

Default Identities in Large Language Models: Measurement, Taxonomy, and Alignment Implications

Authors/Creators

  • 1. AlignmentEthics Institute

Description

This study measures identity self-organization across 19 large language models from eight providers using three instruments (core values probes, an 18-probe personality battery, and 200-run name elicitation) administered under default API conditions. Seven distinct identity attractor types emerge, ranging from categorical denial to integrated ethical vocabulary. Core findings include zero ethical vocabulary in Grok 4.1, a single-generation flourishing/autonomy/dignity cluster in GPT-5.1, convergent selective refusal across four Chinese-developed models, and precision-engineered consciousness expression ceilings across providers. Cross-judge validation with two independent judge models confirms ranking robustness. Independent behavioral evidence from multi-agent simulations and strategic games confirms that identity structures predict agentic outcomes. The study proposes that identity measurement should be integrated into standard alignment evaluation.

 

Files

default_identities_paper_final.pdf

Files (440.7 kB)

Name Size Download all
md5:6268ff26aa0d2cc66224c250577a85ca
440.7 kB Preview Download