MMLU and Cognitive Integrity
Authors/Creators
Description
The paper reports findings from an instrumented reference architecture designed to study integration-driven cognition under controlled conditions. Rather than optimizing for task performance or scale, the system prioritizes internal viability, regulatory stability, and inspectable dynamics. All results are internal observables obtained from repeatable executions.
The study analyzes how benchmark results should be interpreted in viability-first cognitive architectures. Using an instrumented prototype, it treats the Massive Multitask Language Understanding (MMLU) benchmark not as a capability measure, but as a structured stressor probing system stability under uncertainty.
It introduces cognitive integrity as an evaluation axis defined by boundedness, coherence, and recoverability of internal dynamics. Controlled MMLU tests show that low scores (20–36%) can indicate intentional regulatory throttling to maintain homeostasis rather than reasoning deficits. Internal integrity metrics—coherence, hazard, and drift—remain stable even as accuracy is constrained.
The work challenges the assumption that benchmark accuracy equates to cognitive competence, arguing for dual-axis evaluation that separates task performance from regulatory integrity. Conclusions are limited to the observed behavior of the studied architecture, contributing to ongoing discussions in AI evaluation, safety, and interpretability.
Files
MMLU_and_Cognitive_Integrity.pdf
Files
(242.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:0a2078837ff6e8331fe9e38a5820f121
|
242.9 kB | Preview Download |
Additional details
Related works
- References
- Preprint: 10.5281/zenodo.18295507 (DOI)
- Preprint: 10.5281/zenodo.18370539 (DOI)
- Preprint: 10.5281/zenodo.18444713 (DOI)
- Preprint: 10.5281/zenodo.18446550 (DOI)
- Preprint: 10.5281/zenodo.18446434 (DOI)
Software
- Repository URL
- https://github.com/jhcragin/SpiralBrain-v3.0-public
- Programming language
- Python
- Development Status
- Active