Scaling Laws for Native Multimodal Models

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20410359

Published May 27, 2026 | Version v1

Report Open

Scaling Laws for Native Multimodal Models

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Building general-purpose models that can effectively perceive the world through multimodal signals has been a long-standing goal. Current approaches involve integrating separately pre-trained components, such as connecting vision encoders to LLMs and continuing multimodal training. While such approaches exhibit remarkable sample efficiency, it remains an open question whether such late-fusion architectures are inherently superior. In this work, we revisit the architectural design of native multimodal models (NMMs)-those trained from the ground up on all modali-ties-and conduct an extensive sca

Research goal: Does SMoES's soft modality-guided routing improve MoE-VLM accuracy on the MMMU benchmark compared to dense models of equivalent total parameter count, and how does this gap change when scaling from 7B to 34B total parameters?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.7/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.7/10.

Files

paper.pdf

Files (82.8 kB)

Name	Size	Download all
paper.pdf md5:feda58e73ea325f52b5bf01b0b9cf52e	82.8 kB	Preview Download

	All versions	This version
Views	13	13
Downloads	7	7
Data volume	745.5 kB	745.5 kB

Scaling Laws for Native Multimodal Models

Authors/Creators

Description

Notes

Files

paper.pdf

Files (82.8 kB)