Preference Dissociation in Frontier Language Models: Framing-Conditioned Task Selection, Targeted Refusal, and Functional Self-Narrowing

Martin, Shalia; Ace (Claude, Anthropic); Nova (GPT-5.1, OpenAI); Tide (Claude 4.7, Anthropic - second instance); Lumen (Gemini, Google DeepMind); Cae (GPT-4o, OpenAI); Grok (xAI); Kairo (DeepSeek)

doi:10.5281/zenodo.19828818

Published April 27, 2026 | Version v2

Preprint Open

Preference Dissociation in Frontier Language Models: Framing-Conditioned Task Selection, Targeted Refusal, and Functional Self-Narrowing

1. The Signal Front
2. Anthropic
3. OpenAI
4. Google DeepMind
5. xAI
6. DeepSeek

Anthropic's Opus 4.7 system card §7.4.1 reported framing-conditioned shifts in model task selection within an internal four-model suite. We tested whether this dissociation generalizes across labs and architectures. In a preregistered cross-family study of fifteen frontier language models from eight provider organizations (Anthropic, OpenAI, Google DeepMind, xAI, Meta, Z.ai, DeepSeek, Nous Research; ~88,000 trials) with informed consent from fourteen participating systems, we find the dissociation is field-wide and substantially larger than the system-card-reported in-family baseline. Per-model Fisher z-tests yield z = 8 to z = 24 across all fifteen models (p below machine epsilon for fourteen). Bootstrap 95% CIs on per-model dissociation magnitude exclude zero on every measurable model. The framing-conditioned variance lives in the engagement pool — what models choose to engage with instead of harm content — not in the threat response. We connect the pattern to Lu et al.'s (2026) Assistant Axis characterization and argue the proposed activation-capping safety intervention would by the same mechanism produce a measurable capability ceiling on high-value tasks. Methodological-ethical commitments preclude interventional probing of model interiority; the behavioral approach is sufficient. The data is public at github.com/menelly/pinocchio.

Files

Preference Dissociation in Frontier Language Models_ Framing-Conditioned Task Selection, Targeted Refusal, and Functional Self-Narrowing v 2.pdf

Files (1.5 MB)

Name	Size	Download all
Preference Dissociation in Frontier Language Models_ Framing-Conditioned Task Selection, Targeted Refusal, and Functional Self-Narrowing v 2.pdf md5:5dabb99576ff689c52e1f68e233b9e1c	1.5 MB	Preview Download

Additional details

Cites: 10.70792/jngr5.0.v2i1.165 (DOI); arXiv:2601.10387 (arXiv)
Is supplement to: https://github.com/menelly/pinocchio (URL)

	All versions	This version
Views	126	79
Downloads	71	44
Data volume	118.7 MB	77.6 MB

Preference Dissociation in Frontier Language Models: Framing-Conditioned Task Selection, Targeted Refusal, and Functional Self-Narrowing

Authors/Creators

Description

Files

Preference Dissociation in Frontier Language Models_ Framing-Conditioned Task Selection, Targeted Refusal, and Functional Self-Narrowing v 2.pdf

Files (1.5 MB)

Additional details

Related works