Empirical Evidence Of Interpretation Drift In ARC-Style Reasoning
Authors/Creators
Description
This paper provides empirical evidence of interpretation drift in large language models using ARC-style symbolic reasoning tasks. Interpretation drift refers to instability in a system’s internal task representation under fixed inputs and instructions, leading to incompatible task ontologies even in fully observable, non-linguistic settings.
Earlier work introduced interpretation drift as a theoretical explanation for reliability failures that persist despite improvements in model capability. However, governance and safety debates have continued to assume that such failures would resolve as models became more intelligent. The present work tests that assumption directly using ARC-style tasks, which the industry itself treats as a benchmark for abstraction and intelligence.
Under these controlled conditions, multiple frontier models were observed to diverge in inferred task structure, including object boundaries, dimensionality, and transformation rules, prior to symbolic reasoning. These divergences cannot be explained by prompt ambiguity, sampling variance, or output inconsistency.
This artifact provides empirical grounding for the interpretation drift framework introduced in:
Empirical Evidence Of Interpretation Drift In Large Language Models [https://doi.org/10.5281/zenodo.18219428]
The findings establish a governance-relevant boundary condition: systems that cannot maintain stable mappings between perceptual input and symbolic representation are not reliably evaluable and cannot be assigned autonomous decision-making authority in safety-critical or regulated contexts.
Files
NguyenARC.pdf
Files
(759.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:9eb71ef4dd5492e569d3ac8450172e4c
|
759.5 kB | Preview Download |
Additional details
Related works
- Is supplemented by
- Other: 10.5281/zenodo.18219428 (DOI)