There is a newer version of the record available.

Published April 18, 2026 | Version v5
Working paper Open

The Mechanism of Manipulation: A Theory of Dynamically-Stabilized Certainty Traps, Strategic Frame Control, and the Learning Wedge

Description

This paper develops a formal theory of strategic post-action evaluation in a repeated principal–agent interaction where a Sender selects an evaluative frame after observing the Receiver's action. Frames govern contemporaneous payoffs and capture discretionary evaluation — subjective performance review, platform scoring, algorithmic reward shaping, and preference labeling. The Receiver is modeled as an adaptive bandit Q-learner rather than a Bayesian expected-utility maximizer, connecting to the learning-in-games tradition and to self-confirming equilibrium.

The central object is the dynamically-stabilized certainty trap (DSCT): a stable regime in which the Sender holds the Receiver's learned value of engagement at the outside-option indifference point while engagement persists with positive frequency. The core implementability theorem establishes that a DSCT at target value q† is achievable by a stationary Sender strategy if and only if q† lies in the closed convex hull of Receiver rewards reachable by feasible frame mixtures — proved via a Robbins–Siegmund stochastic approximation argument applied to the Q-learning recursion. The optimal stabilizing strategy solves a linear program; under a single mean-stabilization constraint an optimal stabilizer exists supported on at most two frames, with support size bounded sharply by the rank of the active constraint matrix. With frame-switching costs, a shrinking-band hysteresis policy achieves Q-value convergence while driving long-run switching frequency to zero, and strictly dominates any stabilizer with positive asymptotic switching rate when switching costs are positive.

Five structural extensions are established. A regulatory non-monotonicity theorem proves that partial restriction of feasibility — removing interior frames while preserving extremes — strictly reduces risk-averse Receiver welfare by forcing higher-variance bang-bang mixtures, even when Sender extraction is weakly lower. A learning-wedge theorem proves that DSCT is robustly implementable against Q-learners under small payoff perturbations but is not robustly implementable against Bayesian Receivers with correct priors, where exact indifference is destroyed by arbitrarily small perturbations, establishing the mechanism as learning-theoretic rather than equilibrium-theoretic. An identifiability-failure theorem for linear Q-learning shows that when engagement and outside-option feature vectors are collinear the target and outside-option values cannot be controlled independently, with a sharp rank condition on the feature-gram matrix. A Markov-modulated feasibility theorem characterizes the implementable set under ergodic Markov-varying constraints as the Minkowski average of state-contingent reachable sets weighted by the stationary distribution. A strategic RLHF theorem formalizes the DSCT mechanism for pairwise preference learning under the Bradley–Terry model, establishing that the set of reachable fixed points generically forms a manifold of dimension min(d, dim(B_label)), so the learned reward model is non-identified from preference data alone when the labeler is strategic.

This version explicitly situates the general post-action frame-control theory relative to the author's prior work on predictive and reinforcement learning models of the double bind, from which the present paper abstracts to provide a domain-general implementability and optimal-stabilization theory.

Files

The_Mechanism_of_Manipulation__A_Theory_of_Dynamically_Stabilized_Certainty_Traps__Strategic_Frame_Control__and_the_Learning_Wedge (2).pdf

Additional details

Dates

Created
2025-11-26
Updated
2025-11-30
Updated
2025-11-30
Updated
2025-12-14
Updated
2026-04-18