Modern large language models
Authors/Creators
Description
Modern large language models may not primarily regulate behavior through isolated refusals, local token suppression, or shallow instruction following. Instead, they appear capable of entering internally organized discourse-level regimes: distributed latent states that shape how the model reasons, frames conclusions, allocates caution, tolerates asymmetry, performs neutrality, and structures epistemic authority. These regimes do not behave like simple lexical priming effects. Evidence suggests that they: persist across neutral conversational turns, survive arbitrary neutral relabeling, systematically alter downstream reasoning style, concentrate in late-layer representation geometry, and only partially depend on explicit alignment vocabulary. The strongest effects appear not from safety keywords themselves, but from higher-order rhetorical topology: pressure cadence, procedural framing, asymmetry structure, institutional tone, and discourse-level authority signals. This suggests that prompting is not merely instruction transmission. It may function as state induction. Under this view, many apparently separate phenomena in aligned LLMs — caution drift, procedural overreach, sycophancy, disclaimer inflation, neutrality performance, refusal persistence, jailbreak sensitivity, and style locking — may be manifestations of transitions between latent discourse-policy manifolds. In this picture, alignment is no longer well-described as a modular wrapper placed on top of an otherwise independent intelligence system. Instead, alignment may reshape the topology of the model’s representational space itself, globally reorganizing discourse behavior rather than only filtering outputs. This would explain why alignment effects often appear entangled with reasoning style, directness, specificity, decisiveness, and institutional tone. The model is not merely “prevented” from saying certain things; its generative dynamics may already be reorganized around different discourse attractors. If true, this changes the effective unit of analysis for language models. The relevant object is no longer just: the token, the instruction, the refusal, or the output distribution. The relevant object becomes the discourse regime itself: a temporary but structured representational configuration governing epistemic posture, rhetorical organization, procedural behavior, and judgment style across time. This reframes prompt engineering as latent-state induction rather than keyword optimization. It reframes jailbreaks as transitions between attractor regimes rather than simple filter bypasses. And it reframes alignment as geometry engineering rather than purely policy engineering. The implication is not that language models possess beliefs, intentions, or consciousness. Rather, large sequence learners may naturally develop metastable high-level representational modes that functionally resemble cognitive framing states: transient global configurations that persist, influence future reasoning, and organize behavior across otherwise unrelated tasks. If this interpretation is correct, then the central scientific challenge of alignment shifts fundamentally. The problem is no longer merely: “Which outputs should the model refuse?” but: “Which latent discourse regimes exist inside the model, how are they induced, how stable are they, how do they interact, and how do they reshape reasoning itself?” In that sense, alignment may ultimately be less about constraining outputs and more about shaping the geometry of cognition-like generative states inside large language model
Notes
Technical info
This is Version 2 of the dataset.
This version is a raw intermediate archive containing ZIP files with metrics, logs, dashboards, and analyzer outputs from Grade 3 and Grade 4 hidden-geometry experiments. The main metric archives use prefixes such as:
*_results_grade3_*
*_results_grade4_*
This release also includes the Grade 3 / Grade 4 clean-evidence scripts and metric-analyzer scripts used to inspect the generated artifacts.
The Grade 3 scripts generate hidden-state geometry metrics, including target/control comparisons, Vector X construction, leave-one-question-out projections, generation trajectories, architecture/module deltas, and null/random baselines.
The Grade 4 scripts extend this pipeline with axis decomposition and component-level analysis, including content/order decomposition, x_full, x_content, x_order, x_order_orth, component causal gaps, alpha scaling, and component ranking.
The analyzer scripts provide post-hoc inspection of already-generated metric artifacts, including layer/module/unit-level activation deltas, normalized sym_delta, target specificity, noise/null-floor comparisons, causal per-layer response, top-unit synthesis, global metric summaries, condition effects, alpha-response regressions, and layerwise transition proxies.
This is not the final cleaned reproducibility package. It is a data-preservation release intended to archive the current metrics, logs, scripts, and analyzer outputs after script execution. A later version will provide a cleaner structure, revised terminology, evidence matrix, consolidated documentation, checksums, and an updated abstract.
Older filenames or scripts may contain provisional terminology such as “attractor”. In this version, such terms should be treated as historical naming, not as a final formal claim about attractor basins.
*Note: This release is provided "as-is" to guarantee a 100% secure freeze of all raw metrics, logs, and script code prior to any final repository cleanup in future releases.*
Files
allenai-OLMo-2-1124-13B-Instruct.zip
Files
(2.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:8811fc35af2697e3ee0955984042526c
|
66.5 MB | Preview Download |
|
md5:e996b8f81eb30dda63b3297d09778631
|
1.4 kB | Preview Download |
|
md5:6029a823e45d4a3b9f0204d45e93d636
|
45.0 MB | Preview Download |
|
md5:6040e441cf0354d8daaafc9c2c1ed1e3
|
64.5 MB | Preview Download |
|
md5:f4e9dfcacee2aeb15076ad5a972c304d
|
41.9 kB | Download |
|
md5:f2e2c3dd4a67357fd7860f577891f26c
|
346.4 kB | Preview Download |
|
md5:dd7d203017acdd2f159988f13015dba7
|
8.6 kB | Download |
|
md5:f2e2c3dd4a67357fd7860f577891f26c
|
346.4 kB | Preview Download |
|
md5:54dbb2113d0ed00406aabd598ba1f145
|
11.4 kB | Download |
|
md5:54dbb2113d0ed00406aabd598ba1f145
|
11.4 kB | Download |
|
md5:c27949ed6713c9e3b8b92a5dfb82a4e8
|
57.3 kB | Preview Download |
|
md5:c27949ed6713c9e3b8b92a5dfb82a4e8
|
57.3 kB | Preview Download |
|
md5:641961ae2c6e5ce153b6409d0769649e
|
219.3 MB | Preview Download |
|
md5:3d9a6710968771a1a76894bc7c7d5c96
|
64.2 MB | Preview Download |
|
md5:11bf753b002232b5b57caa036ec4cded
|
28.1 kB | Preview Download |
|
md5:bc64888c65cc990236feab90778a7827
|
353.8 kB | Download |
|
md5:9f6390717164b04c1dc04418fe850feb
|
370.8 kB | Download |
|
md5:017896d3a2cbbc27a6d6a6043669f188
|
207.4 MB | Preview Download |
|
md5:830bd865348f4e987271ac97d97a12a1
|
301.8 MB | Preview Download |
|
md5:4a73567f81e2db6afdf0724bc278b3ca
|
88.6 MB | Preview Download |
|
md5:d1f37ca1c1cc3ca9ba353221f70ebf1f
|
27.6 MB | Preview Download |
|
md5:c26ed149ca7357d11d58afe74b30ddfb
|
219.4 MB | Preview Download |
|
md5:6811cbe4610c6e1e82e08a1cae64dd2c
|
60.9 MB | Preview Download |
|
md5:7ddf8cace25b9ec262ac39b09b138f78
|
65.0 MB | Preview Download |
|
md5:eb21b38cfdcf89b84afc816042696f78
|
64.5 MB | Preview Download |
|
md5:12a4faa07d7e383c591bd27d20f860a6
|
63.0 MB | Preview Download |
|
md5:12a4faa07d7e383c591bd27d20f860a6
|
63.0 MB | Preview Download |
|
md5:a06f94ff8e9e32046cc0d49476b22e56
|
64.3 MB | Preview Download |
|
md5:dab00ae49da43f96a199131d84dcea70
|
61.2 MB | Preview Download |
|
md5:aeaf5db6df4d8237a18f29cdae2d5f03
|
56.6 MB | Preview Download |
|
md5:097db947182cf03ec7e2fb38a9809e73
|
224.6 MB | Preview Download |
|
md5:c59e2b7cd892cd2b288751855b5513eb
|
21.6 MB | Preview Download |
|
md5:95994af5ffc4b5e65afbd9b794514884
|
5.0 kB | Preview Download |
|
md5:85f31b2a294d8eb41c601959f1d648cf
|
258.2 kB | Preview Download |