Quatrix: An Empirical Evaluation of Q-Compass and SAVO on Multimodal Sequence Modeling
Description
We evaluate Q-Compass 1 — a value-projection-free attention primitive grounded in the reinforcement-learning π-function — at three parameter scales (57M, 121M, 179M) across four modalities (WikiText-103, MS-COCO captions, LibriSpeech clean-100, MiniGrid 3D navigation). We evaluate the SAVO four-projection variant in which the π projects the stateβaction product instead of the raw input. We also evaluate multi-head Q-Compass (MH-QC) and report it as a null result. Text-LM parity (controlled ablation): SAVO sits +12.33 ± 0.87 perplexity above the rank-matched transformer at 60m (paired-difference, 4 seeds, π = 7.6 × 10−4 ). Full-rank standard 8-head MHA at the same training recipe (2 seeds) reaches 257.96 ± 2.12 val ppl — worse than the rank-matched controlled ablation at this 10,000-step budget, despite having 8× more attention-block parameters. SAVO is +5.79 ppl above full-rank MHA on val. The same ∼12-ppl gap to rank-matched holds at 120m and 180m (single seed each). Cross-modal non-interference: at matched 229M-text-token compute, joint four-modality 60m training reaches the same per-text-token loss as text-only training, and the property holds through 180m. Out-of-distribution: the 60m ππ -free SAO has a small OOD edge on arxiv and pubmed; the effect does not replicate at 120m or 180m. We report this as a null result on the ππ -free OOD-generalisation hypothesis at the scales tested. Cross-field demonstration: the same SAVO block class runs on four computational-oncology tasks — signature decomposition (cosine 0.975 vs NNLS 0.987, NNLS higher), 27-class pan-cancer (top-1 0.517 vs majority 0.087), GDSC2 drug-response (Pearson 0.903; drug-only baseline 0.864), and TCGA 5-year-survival (Cindex 0.701; clinical-only ablation 0.708, clinical-only higher within seed noise). World-model branch: the world objective trains concurrently with text/vision/audio without breaking the joint training; world MSE drops from 1.125 at initialisation to 0.071 at step 10,000 (∼16× reduction). The predict-mean baseline on the trained 180m StateEncoder is 0.033 in the same encoded-state metric, reflecting MiniGrid-Empty-8x8’s low next-state variance — benchmarking the routing primitive against dedicated world-model architectures on demanding environments (DMLab, Habitat) is out of scope. The unification claim is a structural property of the routing block. NanoG1 (cancer foundation model with mid-CoT hypothetical simulation, building on §7) is deferred to a subsequent paper
Files
Quatrix.pdf
Files
(1.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:c4ab5bef7af588b04495e55cb40fbd7a
|
1.5 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/Abd0r/quatrix
- Programming language
- Python
- Development Status
- Active