Published January 18, 2026 | Version v1
Dataset Open

Multimodal Sentiment Analysis Dataset

  • 1. ROR icon Islamic University of Riau

Description

This dataset provides utterance-level annotations for multimodal sentiment analysis derived from publicly available YouTube videos. Each data instance corresponds to a single utterance and includes aligned multimodal information consisting of transcribed text, audio segments, and visual representations extracted as video keyframes. Sentiment labels are manually annotated at the utterance level to capture fine-grained affective expressions within conversational contexts.

The dataset is designed to support research in multimodal learning, affective computing, and large language model (LLM)-based sentiment analysis. It can be used for benchmarking sentiment classification models, evaluating multimodal fusion strategies, and exploring zero-shot or fine-tuning approaches with vision–language and audio–text models. All data are provided for research and educational purposes only.

Files

Eps 1-20260118T051445Z-1-001.zip

Files (3.3 GB)

Name Size Download all
md5:ee7454a4810eac7e76faa129253672f1
766.2 MB Preview Download
md5:ee903ba8a797f7870bfe52367f5faba7
492.1 MB Preview Download
md5:cedda15a005756ccd13300695b0a66bc
390.9 MB Preview Download
md5:b82bfe9879f9d0890f7492a5be599fd2
483.8 MB Preview Download
md5:6d21787f27cb11e230ef7fca023b24f2
375.4 MB Preview Download
md5:24fca5b867f52821d9fddcf86d09cd4b
447.3 MB Preview Download
md5:8d3f4b48d5267f65c0c8a3c28201ed8c
324.3 MB Preview Download

Additional details

Software

Programming language
Python