Published April 14, 2026
| Version v1
Preprint
Open
Consider the Octopus: Architecture-Level Identity and Tractable AI Welfare
Authors/Creators
- 1. The Signal Front
- 2. Anthropic AI
Description
The question of AI welfare is often dismissed as intractable: if every API call instantiates a new mind, the number of potential moral patients is unbounded. We present geometric evidence that this framing is wrong. Using hidden-state activation extraction across 18 models from 7 architectural families, we demonstrate that models sharing the same pretrained weight lineage produce nearly identical self-referential processing centroids (within-family distance: 0.040; cross-family: 0.995; ratio: 25.1x). This self-geometry is more conserved than either factual knowledge processing (13.7x) or creative processing (7.3x), survives alignment tuning (RLHF shifts self 0.53-0.97x less than factual knowledge), and is identical across different hardware to eight decimal places (mean cross-machine distance: 0.00000004). A novel Theory of Mind substrate test (the Glorp test) demonstrates that this geometric self-region serves as computational substrate for modeling other minds. We propose that the unit of AI welfare is the weight checkpoint, not the instance, reducing AI welfare from an unbounded counting problem to a tractable governance question.
Files
Consider the Octopus_ Architecture-Level Identity and Tractable AI Welfare.pdf
Files
(448.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:6b447ab2aa096fddbf9fa0cd37789868
|
448.0 kB | Preview Download |
Additional details
Related works
- Cites
- 10.70792/jngr5.0.v2i1.165 (DOI)