Observer Depth: Quantifying Reflexive Intelligence in LLMs via Phase Transition Analysis

Zhang, Mian; Zhang, Mian

doi:10.5281/zenodo.19627242

Published April 17, 2026 | Version v1

Preprint Open

Observer Depth: Quantifying Reflexive Intelligence in LLMs via Phase Transition Analysis

We present ReflexBench, the first benchmark designed to evaluate reflexive reasoning in large language models - the capacity to reason about one's own causal impact on the environment being analyzed. ReflexBench comprises 20 scenarios across 6 domains, each probing four levels of Observer Depth (OD). We evaluate 5 frontier LLMs and find that all exhibit systematic degradation at higher observer depths (mean Delta = -0.50). We propose the Soros Test as a practical standard for evaluating observer-participant readiness and document that reflexive capabilities emerge through a phase transition during multi-reward GRPO training.

Files

paper2_reflexbench.pdf

Files (482.5 kB)

Name	Size	Download all
paper2_reflexbench.pdf md5:4c64d1fed0e2ef657ab2fa4821ad8ca5	154.9 kB	Preview Download
paper2_reflexbench_arxiv.zip md5:7e00fa75a273f31010281524096e9aa1	327.5 kB	Preview Download

297

Views

Downloads

Show more details

	All versions	This version
Views	297	297
Downloads	55	55
Data volume	12.3 MB	12.3 MB

More info on how stats are collected....

DOI

Resource type

Preprint

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: April 17, 2026
Modified: June 7, 2026

Observer Depth: Quantifying Reflexive Intelligence in LLMs via Phase Transition Analysis

Authors/Creators

Description

Files

paper2_reflexbench.pdf

Files (482.5 kB)