Published April 28, 2020 | Version V2.0
Preprint Open

Quatrix: An Empirical Evaluation of Q-Compass and SAVO on Multimodal Sequence Modeling

  • 1. Independent Researcher

Description

We evaluate Q-Compass 1 — a value-projection-free attention primitive grounded in the reinforcement-learning 𝑄-function — at three parameter scales (57M, 121M, 179M) across four modalities (WikiText-103, MS-COCO captions, LibriSpeech clean-100, MiniGrid 3D navigation). We evaluate the SAVO four-projection variant in which the 𝑉 projects the stateβŠ™action product instead of the raw input. We also evaluate multi-head Q-Compass (MH-QC) and report it as a null result. Text-LM parity (controlled ablation): SAVO sits +12.33 ± 0.87 perplexity above the rank-matched transformer at 60m (paired-difference, 4 seeds, 𝑝 = 7.6 × 10−4 ). Full-rank standard 8-head MHA at the same training recipe (2 seeds) reaches 257.96 ± 2.12 val ppl — worse than the rank-matched controlled ablation at this 10,000-step budget, despite having 8× more attention-block parameters. SAVO is +5.79 ppl above full-rank MHA on val. The same ∼12-ppl gap to rank-matched holds at 120m and 180m (single seed each). Cross-modal non-interference: at matched 229M-text-token compute, joint four-modality 60m training reaches the same per-text-token loss as text-only training, and the property holds through 180m. Out-of-distribution: the 60m π‘Šπ‘‰ -free SAO has a small OOD edge on arxiv and pubmed; the effect does not replicate at 120m or 180m. We report this as a null result on the π‘Šπ‘‰ -free OOD-generalisation hypothesis at the scales tested. Cross-field demonstration: the same SAVO block class runs on four computational-oncology tasks — signature decomposition (cosine 0.975 vs NNLS 0.987, NNLS higher), 27-class pan-cancer (top-1 0.517 vs majority 0.087), GDSC2 drug-response (Pearson 0.903; drug-only baseline 0.864), and TCGA 5-year-survival (Cindex 0.701; clinical-only ablation 0.708, clinical-only higher within seed noise). World-model branch: the world objective trains concurrently with text/vision/audio without breaking the joint training; world MSE drops from 1.125 at initialisation to 0.071 at step 10,000 (∼16× reduction). The predict-mean baseline on the trained 180m StateEncoder is 0.033 in the same encoded-state metric, reflecting MiniGrid-Empty-8x8’s low next-state variance — benchmarking the routing primitive against dedicated world-model architectures on demanding environments (DMLab, Habitat) is out of scope. The unification claim is a structural property of the routing block. NanoG1 (cancer foundation model with mid-CoT hypothetical simulation, building on §7) is deferred to a subsequent paper

Files

Quatrix.pdf

Files (1.5 MB)

Name Size Download all
md5:c4ab5bef7af588b04495e55cb40fbd7a
1.5 MB Preview Download

Additional details

Software

Repository URL
https://github.com/Abd0r/quatrix
Programming language
Python
Development Status
Active