SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs
Description
Mixture-of-Experts (MoE) has become a prevalent backbone for large vision-language models (VLMs), yet how modality-specific signals should guide expert routing remains under-explored. Existing routing strategies are either hand-crafted or modality-agnostic, relying on idealized priors that ignore the layer-dependent modality fusion patterns in MoE-VLMs and provide little guidance for expert specialization. We propose Soft Modality-guided Expert Specialization (SMoES), which consists of dynamic soft modality scores that capture layer-dependent fusion patterns, an expert binning mechanism aligne
Research goal: Does soft modality-guided routing in MoE-VLMs improve robustness to distribution shift on the VQA v2.0 and A-OKVQA datasets compared to dense models of similar parameter count?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.8/10.
Notes
Files
paper.pdf
Files
(89.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:4bd19226e0fcbf0ef3e6e7269e18d669
|
89.8 kB | Preview Download |