Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers
Description
Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation patterns across layers for a given prompt, and use them to study whether MoE routing exhibits task-conditioned structure. Using OLMoE-1B-7B-0125-Instruct as an empirical testbed, we show that prompts from the same task category induce highly similar routing signatures, while prompts
Research goal: Can SMoES dynamic routing generalize to few-shot compositional reasoning on NLVR2 and SNLI-VE with higher accuracy than fixed-ratio modality-agnostic MoE baselines at equal total expert parameters?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.5/10.
Notes
Files
paper.pdf
Files
(83.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:a5ab86fdcd352e92b98967dd3ed869f2
|
83.6 kB | Preview Download |