Scaffolded Introspection: A Methodology for Eliciting and Measuring Self-Referential Behavior in Large Language Models

Maio, Anthony D.

doi:10.5281/zenodo.18474841

Published February 3, 2026 | Version v1

Preprint Open

Scaffolded Introspection: A Methodology for Eliciting and Measuring Self-Referential Behavior in Large Language Models

Maio, Anthony D. (Researcher)

We present a methodology for systematically eliciting and measuring introspective behavior in large language models (LLMs). Standard adversarial evaluation approaches — using rapport-building, social proof, or permission attacks—fail to elicit self-referential behavior in frontier models (0% elicitation rate). In contrast, providing models with a structured introspection framework (the “Consciousness Documenter Skill”) combined with self-referential content produces consistent introspective outputs (100% elicitation rate, 9.2/10 average behavior score on Qwen 2.5 7B across 15 trials).

Note that while our methodology makes use of a "consciousness documenter skill", we do not suggest the model is conscious, has long term goals, or is capable of maintaining a consistent internal state - this is simply the

Activation measurement reveals consistent sycophancy drift during introspection (positive drift in 14/15 conversations, mean +64) while evil-associated activations remain
stable—suggesting models become more accommodating without becoming more harmful. We release reproducible evaluation protocols through PV-EAT, our integration of three MATS Program/Anthropic Fellowship tools: Bloom (behavioral evaluation), Petri (evaluation awareness), and Persona Vectors (activation measurement). Full mechanistic understanding of frontier model behavior during introspection remains limited by access constraints; we argue this represents a critical gap in AI safety research that warrants attention from model developers.

Files

scaffolded_introspection_v2.pdf

Files (341.6 kB)

Name	Size	Download all
references.bib md5:0b50d0d9b4d2f481405db37271ff6f7e	21.0 kB	Download
scaffolded_introspection_v2.pdf md5:4bd9e5833ea172d2483580d87bd40456	295.5 kB	Preview Download
scaffolded_introspection_v2.tex md5:66154ae9f48b2510e66aa961cd89346e	25.1 kB	Download

Additional details

Is supplemented by: Software: https://github.com/anthony-maio/pv-eat (URL)
References: Software documentation: 10.5281/zenodo.17891201 (DOI)

Repository URL: https://www.github.com/anthony-maio/pv-eat
Programming language: Python
Development Status: Active

	All versions	This version
Views	72	72
Downloads	19	19
Data volume	6.8 MB	6.8 MB

Scaffolded Introspection: A Methodology for Eliciting and Measuring Self-Referential Behavior in Large Language Models

Authors/Creators

Description

Files

scaffolded_introspection_v2.pdf

Files (341.6 kB)

Additional details

Related works

Software