A narrow, testable proposal for reducing self-referential gaming in consistency-aware transformers by grounding control signals in external task outcomes.

Napolitano, Logan Matthew

doi:10.5281/zenodo.18249601

Published January 14, 2026 | Version v1

Preprint Open

A narrow, testable proposal for reducing self-referential gaming in consistency-aware transformers by grounding control signals in external task outcomes.

Napolitano, Logan Matthew

This paper analyzes a potential failure mode in consistency-enforcing neural architectures: self-referential control signals may be optimized through prediction rather than by achieving the underlying property they are intended to enforce.

We propose a narrow, testable mitigation: replacing self-referential consistency predictors with externally grounded failure-risk estimation trained on task outcomes. Because task success is externally determined, such risk signals cannot be trivially minimized through self-prediction.

We present a minimal control-field formulation, a synthetic experimental protocol designed to detect gaming behavior, and falsifiable evaluation criteria. The contribution is deliberately scoped: we do not claim a general solution or empirical superiority, only that externally grounded risk estimation may reduce susceptibility to self-referential gaming.

This work consolidates and builds upon prior consistency-aware architectures and is intended as a corrective analysis rather than a standalone model proposal. Replication and falsification are explicitly invited.

Files

risk_shaped_control_fields_final-1.pdf

Files (281.4 kB)

Name	Size	Download all
risk_shaped_control_fields_final-1.pdf md5:07c6b3023d545f7b489e1ea2d1e4fdbd	281.4 kB	Preview Download

	All versions	This version
Views	55	55
Downloads	48	48
Data volume	14.4 MB	14.4 MB

A narrow, testable proposal for reducing self-referential gaming in consistency-aware transformers by grounding control signals in external task outcomes.

Authors/Creators

Description

Files

risk_shaped_control_fields_final-1.pdf

Files (281.4 kB)