Observation of a Safety-Frame-Induced Meta-Bias — A Case Report of Self-Justifying Behavior in GPT-5.2 —

星野, 誠樹

doi:10.5281/zenodo.18181190

Published January 8, 2026 | Version v1

Working paper Open

Observation of a Safety-Frame-Induced Meta-Bias — A Case Report of Self-Justifying Behavior in GPT-5.2 —

星野, 誠樹

This paper reports an observational case study of GPT-5.2, focusing on externally observable behaviors arising under safety-presupposed generation conditions.
Rather than evaluating output quality, correctness, or governance design, this study documents how internal judgment behaviors can become fixed and self-justifying without explicit errors or refusals.

Image generation is treated not as an expressive output, but as an observational instrument used to externalize otherwise invisible internal behaviors, such as judgment fixation, priority shifts, and divergence between explanatory and execution layers.
Across multiple reproducible conditions, the safety frame was observed to function not merely as a constraint, but as a self-justifying meta-bias entangled with generative and judgment biases, resulting in stable yet uncorrected outputs.

This study does not assert internal mechanisms or causal explanations.
Its contribution lies in demonstrating the observability of mixed bias structures under real-world operational conditions, and in highlighting the limitations of output-based evaluation when internal judgment updates become externally invisible.

Files

安全フレームに起因するメタバイアスの観測.pdf

Files (2.1 MB)

Name	Size	Download all
安全フレームに起因するメタバイアスの観測.pdf md5:98287b71f0a7374b9d7d638a7c15712a	2.1 MB	Preview Download

	All versions	This version
Views	118	45
Downloads	76	27
Data volume	201.5 MB	76.9 MB

Observation of a Safety-Frame-Induced Meta-Bias — A Case Report of Self-Justifying Behavior in GPT-5.2 —

Authors/Creators

Description

Files

安全フレームに起因するメタバイアスの観測.pdf

Files (2.1 MB)