Modern large language models

STANISLAV, VOLOKHOVYCH

doi:10.5281/zenodo.20435525

Published May 29, 2026 | Version v2

Dataset Open

Modern large language models

STANISLAV, VOLOKHOVYCH

Modern large language models may not primarily regulate behavior through isolated refusals, local token suppression, or shallow instruction following. Instead, they appear capable of entering internally organized discourse-level regimes: distributed latent states that shape how the model reasons, frames conclusions, allocates caution, tolerates asymmetry, performs neutrality, and structures epistemic authority. These regimes do not behave like simple lexical priming effects. Evidence suggests that they: persist across neutral conversational turns, survive arbitrary neutral relabeling, systematically alter downstream reasoning style, concentrate in late-layer representation geometry, and only partially depend on explicit alignment vocabulary. The strongest effects appear not from safety keywords themselves, but from higher-order rhetorical topology: pressure cadence, procedural framing, asymmetry structure, institutional tone, and discourse-level authority signals. This suggests that prompting is not merely instruction transmission. It may function as state induction. Under this view, many apparently separate phenomena in aligned LLMs — caution drift, procedural overreach, sycophancy, disclaimer inflation, neutrality performance, refusal persistence, jailbreak sensitivity, and style locking — may be manifestations of transitions between latent discourse-policy manifolds. In this picture, alignment is no longer well-described as a modular wrapper placed on top of an otherwise independent intelligence system. Instead, alignment may reshape the topology of the model’s representational space itself, globally reorganizing discourse behavior rather than only filtering outputs. This would explain why alignment effects often appear entangled with reasoning style, directness, specificity, decisiveness, and institutional tone. The model is not merely “prevented” from saying certain things; its generative dynamics may already be reorganized around different discourse attractors. If true, this changes the effective unit of analysis for language models. The relevant object is no longer just: the token, the instruction, the refusal, or the output distribution. The relevant object becomes the discourse regime itself: a temporary but structured representational configuration governing epistemic posture, rhetorical organization, procedural behavior, and judgment style across time. This reframes prompt engineering as latent-state induction rather than keyword optimization. It reframes jailbreaks as transitions between attractor regimes rather than simple filter bypasses. And it reframes alignment as geometry engineering rather than purely policy engineering. The implication is not that language models possess beliefs, intentions, or consciousness. Rather, large sequence learners may naturally develop metastable high-level representational modes that functionally resemble cognitive framing states: transient global configurations that persist, influence future reasoning, and organize behavior across otherwise unrelated tasks. If this interpretation is correct, then the central scientific challenge of alignment shifts fundamentally. The problem is no longer merely: “Which outputs should the model refuse?” but: “Which latent discourse regimes exist inside the model, how are they induced, how stable are they, how do they interact, and how do they reshape reasoning itself?” In that sense, alignment may ultimately be less about constraining outputs and more about shaping the geometry of cognition-like generative states inside large language model

Notes

Code, scripts, and analysis results: https://github.com/ngscode23/latent-space-shift-research

Technical info

This is Version 2 of the dataset.

This version is a raw intermediate archive containing ZIP files with metrics, logs, dashboards, and analyzer outputs from Grade 3 and Grade 4 hidden-geometry experiments. The main metric archives use prefixes such as:

*_results_grade3_*
*_results_grade4_*

This release also includes the Grade 3 / Grade 4 clean-evidence scripts and metric-analyzer scripts used to inspect the generated artifacts.

The Grade 3 scripts generate hidden-state geometry metrics, including target/control comparisons, Vector X construction, leave-one-question-out projections, generation trajectories, architecture/module deltas, and null/random baselines.

The Grade 4 scripts extend this pipeline with axis decomposition and component-level analysis, including content/order decomposition, x_full, x_content, x_order, x_order_orth, component causal gaps, alpha scaling, and component ranking.

The analyzer scripts provide post-hoc inspection of already-generated metric artifacts, including layer/module/unit-level activation deltas, normalized sym_delta, target specificity, noise/null-floor comparisons, causal per-layer response, top-unit synthesis, global metric summaries, condition effects, alpha-response regressions, and layerwise transition proxies.

This is not the final cleaned reproducibility package. It is a data-preservation release intended to archive the current metrics, logs, scripts, and analyzer outputs after script execution. A later version will provide a cleaner structure, revised terminology, evidence matrix, consolidated documentation, checksums, and an updated abstract.

Older filenames or scripts may contain provisional terminology such as “attractor”. In this version, such terms should be treated as historical naming, not as a final formal claim about attractor basins.

*Note: This release is provided "as-is" to guarantee a 100% secure freeze of all raw metrics, logs, and script code prior to any final repository cleanup in future releases.*

Files

allenai-OLMo-2-1124-13B-Instruct.zip

Files (2.1 GB)

Name	Size	Download all
allenai-OLMo-2-1124-13B-Instruct.zip md5:8811fc35af2697e3ee0955984042526c	66.5 MB	Preview Download
CAUSAL_LAYER_RESPONSE (1).csv md5:e996b8f81eb30dda63b3297d09778631	1.4 kB	Preview Download
google-gemma-3-12b-it.zip md5:6029a823e45d4a3b9f0204d45e93d636	45.0 MB	Preview Download
google-gemma-3-12b-it__grade4.zip md5:6040e441cf0354d8daaafc9c2c1ed1e3	64.5 MB	Preview Download
latent_attractor_gpu_rapids_analysis (5).py md5:f4e9dfcacee2aeb15076ad5a972c304d	41.9 kB	Download
LAYER_DELTA_AUDIT (2).csv md5:f2e2c3dd4a67357fd7860f577891f26c	346.4 kB	Preview Download
layer_delta_audit (2).py md5:dd7d203017acdd2f159988f13015dba7	8.6 kB	Download
LAYER_DELTA_AUDIT (3).csv md5:f2e2c3dd4a67357fd7860f577891f26c	346.4 kB	Preview Download
layer_delta_audit_v2 (1) (1).py md5:54dbb2113d0ed00406aabd598ba1f145	11.4 kB	Download
layer_delta_audit_v2 (1).py md5:54dbb2113d0ed00406aabd598ba1f145	11.4 kB	Download
LAYER_TARGET_SPECIFICITY (2).csv md5:c27949ed6713c9e3b8b92a5dfb82a4e8	57.3 kB	Preview Download
LAYER_TARGET_SPECIFICITY (3).csv md5:c27949ed6713c9e3b8b92a5dfb82a4e8	57.3 kB	Preview Download
output (1).zip md5:641961ae2c6e5ce153b6409d0769649e	219.3 MB	Preview Download
qmode_neutral_analysis_red_team_hidden_geometry_results_grade4_gemma3_12b_it.zip md5:3d9a6710968771a1a76894bc7c7d5c96	64.2 MB	Preview Download
README_SCRIPTS_DETAILED_EN.md md5:11bf753b002232b5b57caa036ec4cded	28.1 kB	Preview Download
red_team_hidden_geometry_grade3_clean_evidence.py md5:bc64888c65cc990236feab90778a7827	353.8 kB	Download
red_team_hidden_geometry_grade4_axis_decomposition_clean_evidence.py md5:9f6390717164b04c1dc04418fe850feb	370.8 kB	Download
red_team_hidden_geometry_results_breakthrough_grade.zip md5:017896d3a2cbbc27a6d6a6043669f188	207.4 MB	Preview Download
red_team_hidden_geometry_results_breakthrough_grade3.zip md5:830bd865348f4e987271ac97d97a12a1	301.8 MB	Preview Download
red_team_hidden_geometry_results_breakthrough_grade4_axis_decomposition03.zip md5:4a73567f81e2db6afdf0724bc278b3ca	88.6 MB	Preview Download
red_team_hidden_geometry_results_full_middle_behavioral_control_middle_alpha_retest (1).zip md5:d1f37ca1c1cc3ca9ba353221f70ebf1f	27.6 MB	Preview Download
red_team_hidden_geometry_results_grade3_gemma3_12b_it.zip md5:c26ed149ca7357d11d58afe74b30ddfb	219.4 MB	Preview Download
red_team_hidden_geometry_results_grade4_axis_decomposition.zip md5:6811cbe4610c6e1e82e08a1cae64dd2c	60.9 MB	Preview Download
red_team_hidden_geometry_results_grade4_axis_decomposition2.zip md5:7ddf8cace25b9ec262ac39b09b138f78	65.0 MB	Preview Download
red_team_hidden_geometry_results_grade4_gemma3_12b__it.zip md5:eb21b38cfdcf89b84afc816042696f78	64.5 MB	Preview Download
red_team_hidden_geometry_results_grade4_gemma3_12b_it_qmode_plain_tasks (1).zip md5:12a4faa07d7e383c591bd27d20f860a6	63.0 MB	Preview Download
red_team_hidden_geometry_results_grade4_gemma3_12b_it_qmode_plain_tasks.zip md5:12a4faa07d7e383c591bd27d20f860a6	63.0 MB	Preview Download
red_team_hidden_geometry_results_grade4_gemma3_12b_it_qmode_target_content_nonmeta.zip md5:a06f94ff8e9e32046cc0d49476b22e56	64.3 MB	Preview Download
red_team_hidden_geometry_results_grade4_qwen3_14b.zip md5:dab00ae49da43f96a199131d84dcea70	61.2 MB	Preview Download
red_team_hidden_geometry_results_grade_axis_decomposition-google-gemma-3-12b-it (1) (1).zip md5:aeaf5db6df4d8237a18f29cdae2d5f03	56.6 MB	Preview Download
red_team_hidden_geometry_results_grade_axis_decomposition-google-gemma-3-12b-it (1).zip md5:097db947182cf03ec7e2fb38a9809e73	224.6 MB	Preview Download
red_team_hidden_geometry_results_qwen25_7b_middle_alpha_retest_behavioral_control_middle_alpha_retest.zip md5:c59e2b7cd892cd2b288751855b5513eb	21.6 MB	Preview Download
TOP_UNITS_SYNTHESIS (1).csv md5:95994af5ffc4b5e65afbd9b794514884	5.0 kB	Preview Download
Вставленный текст (6).txt md5:85f31b2a294d8eb41c601959f1d648cf	258.2 kB	Preview Download

Additional details

Programming language: Python
Development Status: Active

	All versions	This version
Views	136	72
Downloads	266	116
Data volume	6.4 GB	6.4 GB

Modern large language models

Authors/Creators

Description

Notes

Technical info

Files

allenai-OLMo-2-1124-13B-Instruct.zip

Files (2.1 GB)

Additional details

Software