Published February 3, 2026 | Version v1
Preprint Open

MMLU and Cognitive Integrity

Authors/Creators

Description

The paper reports findings from an instrumented reference architecture designed to study integration-driven cognition under controlled conditions. Rather than optimizing for task performance or scale, the system prioritizes internal viability, regulatory stability, and inspectable dynamics. All results are internal observables obtained from repeatable executions.
The study analyzes how benchmark results should be interpreted in viability-first cognitive architectures. Using an instrumented prototype, it treats the Massive Multitask Language Understanding (MMLU) benchmark not as a capability measure, but as a structured stressor probing system stability under uncertainty.
It introduces cognitive integrity as an evaluation axis defined by boundedness, coherence, and recoverability of internal dynamics. Controlled MMLU tests show that low scores (20–36%) can indicate intentional regulatory throttling to maintain homeostasis rather than reasoning deficits. Internal integrity metrics—coherence, hazard, and drift—remain stable even as accuracy is constrained.
The work challenges the assumption that benchmark accuracy equates to cognitive competence, arguing for dual-axis evaluation that separates task performance from regulatory integrity. Conclusions are limited to the observed behavior of the studied architecture, contributing to ongoing discussions in AI evaluation, safety, and interpretability.

Files

MMLU_and_Cognitive_Integrity.pdf

Files (242.9 kB)

Name Size Download all
md5:0a2078837ff6e8331fe9e38a5820f121
242.9 kB Preview Download

Additional details

Related works

References
Preprint: 10.5281/zenodo.18295507 (DOI)
Preprint: 10.5281/zenodo.18370539 (DOI)
Preprint: 10.5281/zenodo.18444713 (DOI)
Preprint: 10.5281/zenodo.18446550 (DOI)
Preprint: 10.5281/zenodo.18446434 (DOI)

Software

Repository URL
https://github.com/jhcragin/SpiralBrain-v3.0-public
Programming language
Python
Development Status
Active