Consistency Is All You Need: Cross-Architecture Validation and Replication Guide
Authors/Creators
Description
This is the complete technical companion to "Consistency Is All You Need," extending the original work from a single architecture to a full cross-architecture validation across three fundamentally different large language model families. Where the original paper introduced the concept of lightweight cognitive probes for behavioral detection, this report provides the definitive evidence that the method is architecture-independent, the complete methodology for replication, and a novel discovery about state-space model superiority.
We validate our cognitive probe methodology on three 7B-parameter models that represent the major architectural paradigms in modern AI: Qwen 2.5-7B (transformer with Grouped-Query Attention), Mistral-7B-Instruct-v0.3 (transformer with Sliding Window Attention), and Falcon-Mamba-7B (a pure state-space model with zero attention heads). The same probe architecture — a 200K-parameter fiber projection paired with a lightweight classification head — achieves extreme behavioral separation on all three, with zero modifications to the base model and 0.003% parameter overhead.
Results summary: Qwen achieved separation ratios from 125x to 366x across nine behavioral dimensions (repetition, hedging, verbosity, sycophancy, depth, specificity, calibration, focus, coherence). Mistral achieved 999x separation on all five enhancement probes (depth, specificity, calibration, focus, coherence), representing near-perfect behavioral detection. Falcon-Mamba achieved 999x separation on depth and specificity probes, matching transformer performance despite having a completely different computational mechanism — recurrent state updates instead of attention.
A key novel finding is the discovery that state-space models achieve significantly faster probe convergence than transformers. Using our Convergence Efficiency Metric (CEM = separation / training steps), Mamba's specificity probe reached 724x separation in just 500 steps compared to Qwen requiring 1,500 steps for equivalent performance — a 4.3x convergence advantage. We hypothesize this stems from SSMs' single-pathway information flow creating more coherent behavioral encoding compared to multi-head attention distributing information across parallel pathways.
The probe architecture consists of two components. The Fiber Projection extracts behavioral signals from three model layers (selected at 25%, 50%, and 75% of model depth) and projects them from the full hidden dimension (4096) to a 16-dimensional behavioral fiber space using learned linear projections with softmax-weighted layer aggregation. The Probe Head is a small MLP (16 → 64 → 64 → 1) with ReLU activations and sigmoid output that classifies the fiber embedding into a behavioral score between 0.0 (desired behavior) and 1.0 (undesired behavior).
This report includes: complete training results with step-by-step convergence logs for all architectures; the full probe architecture with exact parameter counts (201,924 parameters per probe); per-token behavioral labeling algorithms for all nine dimensions with complete code; three intervention mechanisms (temperature steering, best-of-K token selection, and logit biasing) with production-ready implementations; hyperparameter sensitivity analysis covering fiber dimension sweeps, learning rate sensitivity, and probe layer selection strategies; a production deployment guide with monitoring code and alert thresholds; and a complete replication guide covering environment setup, hardware requirements, training pipeline, expected results at each checkpoint, and checkpoint format specification.
All results were produced on a single NVIDIA RTX 3090 (24GB) using 4-bit NF4 quantization. The training pipeline uses AdamW optimization with a learning rate of 5e-5, batch size of 2, and gradient accumulation of 8 steps. No distributed training, no cloud compute, and no proprietary datasets were required. Training data is generated synthetically using contrastive prompt-response pairs for each behavioral dimension.
The trained Qwen model with embedded cognitive probes is publicly available on HuggingFace at LoganResearch/qwen2.5-7b-cognitive-enhanced. The project website is at proprietiveai.com.
Keywords: cognitive probes, behavioral detection, AI safety, state-space models, Mamba, transformer probing, lightweight inference, cross-architecture validation, behavioral control, LLM monitoring
Files
proprioceptive_ai_completehhh.pdf
Files
(81.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:cee6f2a133fda7d8672cab45957f8981
|
81.7 kB | Preview Download |
Additional details
Dates
- Created
-
2026-02-04