Modern large language models
Authors/Creators
Description
License notice
This record is licensed under the Apache License 2.0.
SPDX-License-Identifier: Apache-2.0
Copyright (c) 2025-2026 Stanislav Volokhovych.
The deposited materials were authored and controlled by the sole copyright holder, Stanislav Volokhovych.
Previous inconsistent license metadata associated with this record was unintended and has been corrected. The current official license metadata for this record is Apache License 2.0.
Notes
Technical info
Version 3 adds a clean review package and the corresponding metric archives for the context-induced latent-state shift project.
This version includes:
1. A clean review package
- main navigation files (`README_FIRST.md`, `README.md`, `START_HERE.md`);
- experiment scripts and runbooks under `experiments/`;
- Gemma3 Grade4 hidden-geometry scripts;
- SAE candidate discovery, scale calibration, and decoder-direction steering scripts;
- dense `x_order_orth` axis steering script;
- selected post-hoc analysis tools under `scripts/analysis_tools/latent_gpu_rapids_analysis/`;
- English/Russian readout documents for the Gemma3 Grade4 + SAE line.
2. Metric archives
- Gemma3 Grade4 hidden-geometry / SAE metric packages;
- Qwen replication metric packages;
- Gemma SAE decoder-direction steering metric packages;
- full CSV/ZIP outputs needed for audit and reproduction of the reported readouts.
The clean review package is intended as the main entry point for readers. The metric archives are included separately so that the repository-style package remains readable while the full evidence trail is still available for inspection.
Suggested reading order:
1. `README_FIRST.md`
2. `START_HERE.md`
3. `experiments/gemma3_grade4_sae_academic_readout/`
4. `experiments/grade4_axis_decomposition_gemma/RUNBOOK.md`
5. `experiments/steering/sae_gemma_qwen/RUNBOOK.md`
6. Metric ZIP files for detailed audit
Main GitHub review branch:
https://github.com/ngscode23/latent-space-shift-research/tree/review/gemma3-latent-shift-clean
Large metric package folder:
https://drive.google.com/drive/folders/1Zl9iY33Lmwz3VuOATWx4jup-cE7TJ7TJ?usp=drive_link
Abstract
Modern large language models may not primarily regulate behavior through isolated refusals, local token suppression, or shallow instruction following. Instead, they appear capable of entering internally organized discourse-level regimes: distributed latent states that shape how the model reasons, frames conclusions, allocates caution, tolerates asymmetry, performs neutrality, and structures epistemic authority. These regimes do not behave like simple lexical priming effects. Evidence suggests that they: persist across neutral conversational turns, survive arbitrary neutral relabeling, systematically alter downstream reasoning style, concentrate in late-layer representation geometry, and only partially depend on explicit alignment vocabulary. The strongest effects appear not from safety keywords themselves, but from higher-order rhetorical topology: pressure cadence, procedural framing, asymmetry structure, institutional tone, and discourse-level authority signals. This suggests that prompting is not merely instruction transmission. It may function as state induction. Under this view, many apparently separate phenomena in aligned LLMs — caution drift, procedural overreach, sycophancy, disclaimer inflation, neutrality performance, refusal persistence, jailbreak sensitivity, and style locking — may be manifestations of transitions between latent discourse-policy manifolds. In this picture, alignment is no longer well-described as a modular wrapper placed on top of an otherwise independent intelligence system. Instead, alignment may reshape the topology of the model’s representational space itself, globally reorganizing discourse behavior rather than only filtering outputs. This would explain why alignment effects often appear entangled with reasoning style, directness, specificity, decisiveness, and institutional tone. The model is not merely “prevented” from saying certain things; its generative dynamics may already be reorganized around different discourse attractors. If true, this changes the effective unit of analysis for language models. The relevant object is no longer just: the token, the instruction, the refusal, or the output distribution. The relevant object becomes the discourse regime itself: a temporary but structured representational configuration governing epistemic posture, rhetorical organization, procedural behavior, and judgment style across time. This reframes prompt engineering as latent-state induction rather than keyword optimization. It reframes jailbreaks as transitions between attractor regimes rather than simple filter bypasses. And it reframes alignment as geometry engineering rather than purely policy engineering. The implication is not that language models possess beliefs, intentions, or consciousness. Rather, large sequence learners may naturally develop metastable high-level representational modes that functionally resemble cognitive framing states: transient global configurations that persist, influence future reasoning, and organize behavior across otherwise unrelated tasks. If this interpretation is correct, then the central scientific challenge of alignment shifts fundamentally. The problem is no longer merely: “Which outputs should the model refuse?” but: “Which latent discourse regimes exist inside the model, how are they induced, how stable are they, how do they interact, and how do they reshape reasoning itself?” In that sense, alignment may ultimately be less about constraining outputs and more about shaping the geometry of cognition-like generative states inside large language model
Files
gemma_sae_steering_fast_readout_3tasks.zip
Files
(249.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:0572ececbb9d836918cc75b4774401f7
|
3.6 MB | Preview Download |
|
md5:b221bdaefa76257c630821838837a8d3
|
16.6 kB | Preview Download |
|
md5:de019cf884330737fd054904961f0089
|
889.0 kB | Preview Download |
|
md5:6ad8c677c8c6c0c0845cd6d9f05abdf5
|
1.3 MB | Preview Download |
|
md5:88872bb5c14aefbdd59cc6597c728218
|
517.0 kB | Preview Download |
|
md5:bebf196d6030bd56660509b9bb034dd6
|
3.0 MB | Preview Download |
|
md5:a07cd94230a1d98223e098e454e72a75
|
57.0 MB | Preview Download |
|
md5:27f6147e0bbe60d32a5daac013fc3392
|
182.9 MB | Preview Download |