SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs
Description
Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent modality fusion patterns in MoE-VLMs and provide little guidance for expert specialization. We propose Soft Modality-guided Expert Specialization (SMoES), which consists of dynamic soft modality scores that capture layer-dependent fusion patterns, an expert binning mechanism aligne
Research goal: How does SMoES compare to modality-agnostic MoE-VLMs in terms of inference throughput (tokens/sec) on Winoground when scaling to 64 experts with top-2 routing?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.3/10.
Notes
Files
paper.pdf
Files
(81.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:432aa0ccb2c6e5844fd31d6f5eb5a65f
|
81.3 kB | Preview Download |