Epistemic Twins: Enabling a Symbolic Science of Language Model Knowledge
Authors/Creators
Description
Large Language Models (LLMs) are impactful yet opaque artifacts. At their core, they are subsymbolic constructs defined by billions of numeric weights that interact in a largely inscrutable manner.
Current analysis paradigms are either black-box benchmarks that test model performance on pre-defined tasks, or mechanistic interpretability approaches that trace back outputs to specific weights.
Both analysis methods are limited by the experimenter's hypothesis space - one must know what to look for to find it. In this perspective, we argue for a third, radically different analysis paradigm: Epistemic Twins. We propose constructing large-scale symbolic approximations of LLMs in human-readable formats. This enables the comprehensive materialization of factual knowledge (or beliefs) inherent in the model without predefining hypotheses, facilitating large-scale analysis and auditing towards better understanding and explainability.
Files
Epistemic_Twins.pdf
Files
(110.2 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:526fa47bf77374cca1b5fd0e60090d85
|
110.2 kB | Preview Download |
Additional details
Software
- Repository URL
- https://gptkb.org