MambaFormer: Token-Level Guided Routing Mixture-of-Experts for Accurate and Effi
Description
The deployment of large language models (LLMs) in real-world clinical applications is constrained by the fundamental trade-off between computational cost and the efficiency of linear-time models. To address this, we propose an LLM-based MambaFormer hybrid Mixture-of-Experts (MoE) framework for efficient medical question-answering (QA) and clinical assistance. The MambaFormer employs a lightweight gating mechanism that performs token-level dynamic routing to a customized Transformer expert (ET5) for short, complex queries or to a State Space Model expert (EMamba) for long, high-throughput seque
Research goal: How does the throughput-accuracy trade-off of dynamic expert specialization in MoE-VLMs compare to fixed top-2 routing on VQA v2 and GQA benchmarks when scaling active parameters from 1B to 10B?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.5/10.
Notes
Files
paper.pdf
Files
(88.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:1ebc94980f4dcb4311d338e98588dab4
|
88.1 kB | Preview Download |