Correct Intermediate Concepts: A Non-Circular Theory of Recognition, Stability, Certification, and Learning
Authors/Creators
Description
This manuscript develops a theoretical framework in which intermediate concepts are treated as structurally real and scientifically meaningful objects rather than as incidental by-products of a particular architecture. Its central goal is to replace weak post hoc interpretability with a non-circular account of when intermediate modules are correct, how they can be compared across systems, and why they matter computationally. The framework begins by defining problem families, real-use contexts, concept libraries, realizations, and admissible assembly, then introduces a multi-axis carrying-cost regime that prices representation, call, assembly, leakage, and over-refinement burdens. On this basis, the manuscript establishes a first flagship theorem package: recognition of correct concept classes across near-optimal systems, stability under perturbation and recombination, and weak cheap-substitution impossibility. It then extends the theory through intrinsic cohesion, assembly geometry, obstruction and enrichment structure, endogenous certification, and a learning-theoretic program for recovering correct intermediate modules. The final part translates the theory into an illustrative architecture and a new evaluation protocol, arguing that intelligence should be studied not only through end-task performance, but through the discovery, certification, reuse, and structural testing of correct intermediate concepts.
Files
intermediate_concepts_theory.md
Files
(317.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:39895d64a61bda516a642e0388962cca
|
317.4 kB | Preview Download |