Published January 9, 2026 | Version v4
Other Open

Empirical Evidence Of Interpretation Drift In ARC-Style Reasoning

Description

This paper provides empirical evidence of interpretation drift in large language models using ARC-style symbolic reasoning tasks. Interpretation drift refers to instability in a system’s internal task representation under fixed inputs and instructions, leading to incompatible task ontologies even in fully observable, non-linguistic settings.

Earlier work introduced interpretation drift as a theoretical explanation for reliability failures that persist despite improvements in model capability. However, governance and safety debates have continued to assume that such failures would resolve as models became more intelligent. The present work tests that assumption directly using ARC-style tasks, which the industry itself treats as a benchmark for abstraction and intelligence.

Under these controlled conditions, multiple frontier models were observed to diverge in inferred task structure, including object boundaries, dimensionality, and transformation rules, prior to symbolic reasoning. These divergences cannot be explained by prompt ambiguity, sampling variance, or output inconsistency.

This artifact provides empirical grounding for the interpretation drift framework introduced in:

Empirical Evidence Of Interpretation Drift In Large Language Models  [https://doi.org/10.5281/zenodo.18219428

The findings establish a governance-relevant boundary condition: systems that cannot maintain stable mappings between perceptual input and symbolic representation are not reliably evaluable and cannot be assigned autonomous decision-making authority in safety-critical or regulated contexts.

Files

NguyenARC.pdf

Files (759.5 kB)

Name Size Download all
md5:9eb71ef4dd5492e569d3ac8450172e4c
759.5 kB Preview Download

Additional details

Related works

Is supplemented by
Other: 10.5281/zenodo.18219428 (DOI)