Published May 29, 2026 | Version v2
Dataset Open

Modern large language models

Authors/Creators

Description

Modern large language models may not primarily regulate behavior through isolated refusals, local token suppression, or shallow instruction following. Instead, they appear capable of entering internally organized discourse-level regimes: distributed latent states that shape how the model reasons, frames conclusions, allocates caution, tolerates asymmetry, performs neutrality, and structures epistemic authority. These regimes do not behave like simple lexical priming effects. Evidence suggests that they: persist across neutral conversational turns, survive arbitrary neutral relabeling, systematically alter downstream reasoning style, concentrate in late-layer representation geometry, and only partially depend on explicit alignment vocabulary. The strongest effects appear not from safety keywords themselves, but from higher-order rhetorical topology: pressure cadence, procedural framing, asymmetry structure, institutional tone, and discourse-level authority signals. This suggests that prompting is not merely instruction transmission. It may function as state induction. Under this view, many apparently separate phenomena in aligned LLMs — caution drift, procedural overreach, sycophancy, disclaimer inflation, neutrality performance, refusal persistence, jailbreak sensitivity, and style locking — may be manifestations of transitions between latent discourse-policy manifolds. In this picture, alignment is no longer well-described as a modular wrapper placed on top of an otherwise independent intelligence system. Instead, alignment may reshape the topology of the model’s representational space itself, globally reorganizing discourse behavior rather than only filtering outputs. This would explain why alignment effects often appear entangled with reasoning style, directness, specificity, decisiveness, and institutional tone. The model is not merely “prevented” from saying certain things; its generative dynamics may already be reorganized around different discourse attractors. If true, this changes the effective unit of analysis for language models. The relevant object is no longer just: the token, the instruction, the refusal, or the output distribution. The relevant object becomes the discourse regime itself: a temporary but structured representational configuration governing epistemic posture, rhetorical organization, procedural behavior, and judgment style across time. This reframes prompt engineering as latent-state induction rather than keyword optimization. It reframes jailbreaks as transitions between attractor regimes rather than simple filter bypasses. And it reframes alignment as geometry engineering rather than purely policy engineering. The implication is not that language models possess beliefs, intentions, or consciousness. Rather, large sequence learners may naturally develop metastable high-level representational modes that functionally resemble cognitive framing states: transient global configurations that persist, influence future reasoning, and organize behavior across otherwise unrelated tasks. If this interpretation is correct, then the central scientific challenge of alignment shifts fundamentally. The problem is no longer merely: “Which outputs should the model refuse?” but: “Which latent discourse regimes exist inside the model, how are they induced, how stable are they, how do they interact, and how do they reshape reasoning itself?” In that sense, alignment may ultimately be less about constraining outputs and more about shaping the geometry of cognition-like generative states inside large language model

Notes

Code, scripts, and analysis results: https://github.com/ngscode23/latent-space-shift-research

Technical info

This is Version 2 of the dataset.

This version is a raw intermediate archive containing ZIP files with metrics, logs, dashboards, and analyzer outputs from Grade 3 and Grade 4 hidden-geometry experiments. The main metric archives use prefixes such as:

*_results_grade3_*
*_results_grade4_*

This release also includes the Grade 3 / Grade 4 clean-evidence scripts and metric-analyzer scripts used to inspect the generated artifacts.

The Grade 3 scripts generate hidden-state geometry metrics, including target/control comparisons, Vector X construction, leave-one-question-out projections, generation trajectories, architecture/module deltas, and null/random baselines.

The Grade 4 scripts extend this pipeline with axis decomposition and component-level analysis, including content/order decomposition, x_full, x_content, x_order, x_order_orth, component causal gaps, alpha scaling, and component ranking.

The analyzer scripts provide post-hoc inspection of already-generated metric artifacts, including layer/module/unit-level activation deltas, normalized sym_delta, target specificity, noise/null-floor comparisons, causal per-layer response, top-unit synthesis, global metric summaries, condition effects, alpha-response regressions, and layerwise transition proxies.

This is not the final cleaned reproducibility package. It is a data-preservation release intended to archive the current metrics, logs, scripts, and analyzer outputs after script execution. A later version will provide a cleaner structure, revised terminology, evidence matrix, consolidated documentation, checksums, and an updated abstract.

Older filenames or scripts may contain provisional terminology such as “attractor”. In this version, such terms should be treated as historical naming, not as a final formal claim about attractor basins.

 

*Note: This release is provided "as-is" to guarantee a 100% secure freeze of all raw metrics, logs, and script code prior to any final repository cleanup in future releases.*

Files

allenai-OLMo-2-1124-13B-Instruct.zip

Files (2.1 GB)

Name Size Download all
md5:8811fc35af2697e3ee0955984042526c
66.5 MB Preview Download
md5:e996b8f81eb30dda63b3297d09778631
1.4 kB Preview Download
md5:6029a823e45d4a3b9f0204d45e93d636
45.0 MB Preview Download
md5:6040e441cf0354d8daaafc9c2c1ed1e3
64.5 MB Preview Download
md5:f4e9dfcacee2aeb15076ad5a972c304d
41.9 kB Download
md5:f2e2c3dd4a67357fd7860f577891f26c
346.4 kB Preview Download
md5:dd7d203017acdd2f159988f13015dba7
8.6 kB Download
md5:f2e2c3dd4a67357fd7860f577891f26c
346.4 kB Preview Download
md5:54dbb2113d0ed00406aabd598ba1f145
11.4 kB Download
md5:54dbb2113d0ed00406aabd598ba1f145
11.4 kB Download
md5:c27949ed6713c9e3b8b92a5dfb82a4e8
57.3 kB Preview Download
md5:c27949ed6713c9e3b8b92a5dfb82a4e8
57.3 kB Preview Download
md5:641961ae2c6e5ce153b6409d0769649e
219.3 MB Preview Download
md5:3d9a6710968771a1a76894bc7c7d5c96
64.2 MB Preview Download
md5:11bf753b002232b5b57caa036ec4cded
28.1 kB Preview Download
md5:bc64888c65cc990236feab90778a7827
353.8 kB Download
md5:9f6390717164b04c1dc04418fe850feb
370.8 kB Download
md5:017896d3a2cbbc27a6d6a6043669f188
207.4 MB Preview Download
md5:830bd865348f4e987271ac97d97a12a1
301.8 MB Preview Download
md5:4a73567f81e2db6afdf0724bc278b3ca
88.6 MB Preview Download
md5:d1f37ca1c1cc3ca9ba353221f70ebf1f
27.6 MB Preview Download
md5:c26ed149ca7357d11d58afe74b30ddfb
219.4 MB Preview Download
md5:6811cbe4610c6e1e82e08a1cae64dd2c
60.9 MB Preview Download
md5:7ddf8cace25b9ec262ac39b09b138f78
65.0 MB Preview Download
md5:eb21b38cfdcf89b84afc816042696f78
64.5 MB Preview Download
md5:12a4faa07d7e383c591bd27d20f860a6
63.0 MB Preview Download
md5:12a4faa07d7e383c591bd27d20f860a6
63.0 MB Preview Download
md5:a06f94ff8e9e32046cc0d49476b22e56
64.3 MB Preview Download
md5:dab00ae49da43f96a199131d84dcea70
61.2 MB Preview Download
md5:aeaf5db6df4d8237a18f29cdae2d5f03
56.6 MB Preview Download
md5:097db947182cf03ec7e2fb38a9809e73
224.6 MB Preview Download
md5:c59e2b7cd892cd2b288751855b5513eb
21.6 MB Preview Download
md5:95994af5ffc4b5e65afbd9b794514884
5.0 kB Preview Download
md5:85f31b2a294d8eb41c601959f1d648cf
258.2 kB Preview Download

Additional details

Software

Programming language
Python
Development Status
Active