Semantic Keys as Attractor Basin Switches: Scaling Laws and Architectural Universality of Role-Based Confidence Modulation in Large Language Models
Description
We investigate whether semantic role prompts act as attractor-basin switches that modulate confidence and hallucination rates in large language models. Through 8,000+ inference runs across 5 models (3B–32B) spanning two architectures (Qwen2.5 and Llama-3), we identify three core findings: (1) semantic keys trigger binary confidence transitions; (2) scaling laws are non-monotonic; and (3) architectural differences outweigh scale differences in vulnerability to role-based manipulation. Low-level interventions fail to alter confidence behavior, whereas prompt-level semantic input reliably switches behavioral modes.
Files
RAD-Scale_Paper.pdf
Files
(753.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:5995f4d225b43baae1d8047bbf22e10f
|
753.3 kB | Preview Download |