SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding

Radwan, Ahmed; Emmanouilidis, Christos; Raza, Shaina

doi:10.48550/arXiv.2601.21666

Published January 29, 2026 | Version v1

Preprint Open

SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding

1. Vector Institute
2. Athena Research and Innovation Center In Information Communication & Knowledge Technologies

Multimodal Large Language Models (MLLMs) are a major focus of recent AI research. However, most prior work focuses on static image understanding, while their ability to process sequential audio-video data remains underexplored. This gap highlights the need for a high-quality benchmark to systematically evaluate MLLM performance in a real-world setting. We introduce SONIC-O1, a comprehensive, fully human-verified benchmark spanning 13 real-world conversational domains with 4,958 annotations and demographic metadata. SONIC-O1 evaluates MLLMs on key tasks, including open-ended summarization, multiple-choice question (MCQ) answering, and temporal localization with supporting rationales (reasoning). Experiments on closed- and open-source models reveal limitations. While the performance gap in MCQ accuracy between two model families is relatively small, we observe a substantial 22.6% performance difference in temporal localization between the best performing closed-source and open-source models. Performance further degrades across demographic groups, indicating persistent disparities in model behavior. Overall, SONIC-O1 provides an open evaluation suite for temporally grounded and socially robust multimodal understanding.

Files

sonic_paper.pdf

Files (7.1 MB)

Name	Size	Download all
sonic_paper.pdf md5:fc4de02b5f4bb501f22acc5fdf93898c	7.1 MB	Preview Download

Additional details

arXiv: arXiv:2601.21666

European Commission
AIXPERT - An agentic, multi-layer, GenAI-powered backbone to make an AI system explainable, accountable, and transparent 101214389

Available: 2026-01-29

Repository URL: https://github.com/VectorInstitute/sonic-o1

	All versions	This version
Views	23	23
Downloads	11	11
Data volume	84.8 MB	84.8 MB

sonic_paper.pdf

Files (7.1 MB)

Identifiers

Funding

Dates

Software

SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding

Authors/Creators

Description

Files

sonic_paper.pdf

Files (7.1 MB)

Additional details

Identifiers

Funding

Dates

Software