SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs
Description
Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent modality fusion patterns in MoE-VLMs and provide little guidance for expert specialization. We propose Soft Modality-guided Expert Specialization (SMoES), which consists of dynamic soft modality scores that capture layer-dependent fusion patterns, an expert binning mechanism aligne
Research goal: How does SMoES routing compare to dense baselines and hard-routed MoE-VLMs on inference throughput (tokens/sec) versus ANLS accuracy when scaling from 7B to 13B+ parameters on DocVQA?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.8/10.
Notes
Files
paper.pdf
Files
(82.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:a2facd23d5264d6782811eb739754c65
|
82.1 kB | Preview Download |