Parallax: Inference-Time Cognitive Enhancement Across Seven Foundation Models
Description
Parallax is a multi-module cognitive augmentation middleware layer that operates at inference time to measurably improve the quality of foundation model outputs without fine-tuning, weight modification, or model-specific training. We evaluate Parallax across seven foundation models — Claude Opus 4.6, Claude Sonnet 4.6, GPT-4.1, Mistral Large, DeepSeek v3.1, GPT-OSS 120B, and Qwen3-VL 235B — using a 38-task benchmark battery (24 elevation, 9 stability, 5 preservation) spanning cognitive elevation, multi-turn stability, and output preservation. All outputs were scored blind by two independent AI judges (Claude Opus 4.6 and Grok 4.1 Fast) across five dimensions (Depth, Utility, Specificity, Coherence, Elevation) on a 0–3 scale.
Six of seven models showed positive cognitive lift when Parallax was active, with averaged dual-judge gains ranging from +0.13 to +0.69 composite points across models. The strongest single-model result was Mistral Large, which improved from 1.46 to 2.27 under one judge and 2.13 to 2.64 under the other. Parallax also improved the frontier model (Claude Opus 4.6: +0.46 avg), demonstrating value beyond rescue of underperforming systems. Inter-rater reliability was strong: 96.2% agreement within one point, 56.1% exact match.
Elevation (+0.57 avg lift) and Depth (+0.48 avg lift) were the most consistently improved dimensions, confirming that Parallax primarily enhances cognitive processing rather than surface formatting. One model (Qwen3-VL 235B) showed negligible negative lift (−0.10 avg), attributable to alignment-origin mismatch rather than capability deficit.
Files
Parallax_Paper_v1.pdf
Files
(515.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:54020df8b6dda72579f6078e00e5b06b
|
515.3 kB | Preview Download |
Additional details
Related works
- Cites
- Preprint: 10.5281/zenodo.18940959 (DOI)