FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Lan
Description
Recent large language models such as Gemini-1.5, DeepSeek-V3, and Llama-4 increasingly adopt Mixture-of-Experts (MoE) architectures, which offer strong efficiency-performance trade-offs by activating only a fraction of the model per token. Yet academic researchers still lack a fully open, end-to-end MoE platform for investigating scaling, routing, and expert behavior. We release FLAME-MoE, a completely open-source research suite composed of seven decoder-only models, ranging from 38M to 1.7B active parameters, whose architecture--64 experts with top-8 gating and 2 shared experts--closely refle
Research goal: What is the impact of token scheduling in ExpertFlow on attribute binding accuracy (e.g., on AMBER) relative to dense baselines under varying expert activation budgets in MoE vision-language models?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.8/10.
Notes
Files
paper.pdf
Files
(87.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:dae187bbc2d8a67b350784f14099a4dc
|
87.5 kB | Preview Download |