Published December 1, 2025 | Version v1
Preprint Open

Dopamine Charges a Tax: Signed Prediction Error Was Never About Value

Authors/Creators

Description

For over two decades, phasic dopamine has been interpreted as a temporal-difference reward prediction error signal (Schultz, 1997). This framework, despite ~40,000 citations, fails systematically to explain sensory preconditioning, outcome devaluation persistence, identity unblocking, model-based planning transients, and information-seeking behavior in the absence of reward.

We propose a thermodynamically grounded alternative: dopamine signals the instantaneous rate of irreversible hypothesis collapse—the entropic cost of committing to one interpretation, action, or prediction over alternatives. This commitment-cost framework is derived from Landauer's principle: any irreversible computational operation (erasing a bit, collapsing a hypothesis, selecting an action) must dissipate at least k_B T ln(2) of heat to the environment.

Core Formalism:

When the brain reduces its internal hypothesis space from N(t⁻) to N(t⁺) possibilities through deliberate commitment (policy selection, perceptual binding, causal inference), it performs an irreversible compression incurring minimal dissipation:

ΔQ ≥ k_B T ln[N(t⁻)/N(t⁺)]

We propose that phasic dopamine broadcasts the instantaneous rate of this dissipation:

DA(t) ∝ k_B T · dH/dt = k_B T · d/dt[ln N(t)]

  • Positive bursts (dH/dt < 0): Collapsing hypotheses—narrowing from many possible futures to one
  • Negative dips (dH/dt > 0): Forced re-expansion—reinstating previously pruned alternatives when prediction fails

Resolution of Major Anomalies:

  1. Sensory Preconditioning: Light→Tone pairing (no reward), then Tone→Sucrose. Test: Light evokes dopamine burst. TD-error predicts zero (Light never predicted reward). Commitment-cost predicts burst from reactivating irreversible Light-Tone binding.
  2. Outcome Devaluation: After training Lever→Food, devalue food. Behavior stops immediately but dopamine still bursts to lever cue. TD-error predicts zero (food now aversive). Commitment-cost explains: motor policy still collapses "press vs. don't press" decision space regardless of outcome value.
  3. Identity Unblocking: Dopamine responds to information that disambiguates causal structure even without reward contingency changes. TD-error cannot explain. Commitment-cost predicts: resolving latent causes collapses hypothesis space.
  4. Model-Based Planning: Dopamine transients during mental simulation before outcome. TD-error timing unclear. Commitment-cost predicts: simulating actions collapses imagined possibility space.
  5. Pure Information Seeking: Animals show dopamine bursts when resolving uncertainty in tasks with no reward differential. TD-error predicts zero. Commitment-cost predicts bursts proportional to entropy reduction.

Testable Predictions (2026-2030):

  1. Dopamine bursts in pure-information tasks with matched reward but varying uncertainty resolution (entropy reduction varies, reward constant)
  2. Forced uncertainty inflation produces negative dips proportional to number of hypotheses reinstated, not reward magnitude change
  3. Burst amplitude scales with log(number of alternatives collapsed), not reward magnitude
  4. Model-based planning evokes pre-outcome transients proportional to simulated policy space reduction
  5. Rebound bursts after dips scale with regained certainty (entropy re-reduction), not recovered reward value

This framework is immediately falsifiable through experiments varying hypothesis space size while holding reward constant, or measuring dopamine during pure information tasks. Unlike TD-error theory, it makes quantitative predictions about burst amplitude as a function of entropy reduction.

Methodological Note: This theoretical proposal follows the "problem-first" approach: identifying systematic empirical failures of the dominant theory, deriving an alternative from first principles (thermodynamics), and generating falsifiable predictions. The framework is presented as a testable hypothesis, not established fact.

Files

DopamineThermoTax.pdf

Files (170.0 kB)

Name Size Download all
md5:bffe8b88ae15e5e54fa6bc0507b55ec7
170.0 kB Preview Download

Additional details

Dates

Submitted
2025-12-01
Dopamine has been interpreted as reward prediction error for 25+ years, but this framework fails systematically in sensory preconditioning, devaluation, and information-seeking tasks. We propose instead that dopamine signals the thermodynamic cost of irreversible hypothesis collapse—the entropy production rate when the brain commits to one interpretation over alternatives—reconciling all major anomalies and making quantitative testable predictions.