Multimodal Sentiment Analysis Dataset
Description
This dataset provides utterance-level annotations for multimodal sentiment analysis derived from publicly available YouTube videos. Each data instance corresponds to a single utterance and includes aligned multimodal information consisting of transcribed text, audio segments, and visual representations extracted as video keyframes. Sentiment labels are manually annotated at the utterance level to capture fine-grained affective expressions within conversational contexts.
The dataset is designed to support research in multimodal learning, affective computing, and large language model (LLM)-based sentiment analysis. It can be used for benchmarking sentiment classification models, evaluating multimodal fusion strategies, and exploring zero-shot or fine-tuning approaches with vision–language and audio–text models. All data are provided for research and educational purposes only.
Files
Eps 1-20260118T051445Z-1-001.zip
Files
(3.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:ee7454a4810eac7e76faa129253672f1
|
766.2 MB | Preview Download |
|
md5:ee903ba8a797f7870bfe52367f5faba7
|
492.1 MB | Preview Download |
|
md5:cedda15a005756ccd13300695b0a66bc
|
390.9 MB | Preview Download |
|
md5:b82bfe9879f9d0890f7492a5be599fd2
|
483.8 MB | Preview Download |
|
md5:6d21787f27cb11e230ef7fca023b24f2
|
375.4 MB | Preview Download |
|
md5:24fca5b867f52821d9fddcf86d09cd4b
|
447.3 MB | Preview Download |
|
md5:8d3f4b48d5267f65c0c8a3c28201ed8c
|
324.3 MB | Preview Download |
Additional details
Software
- Programming language
- Python