GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos

Kumar, Deepak; Abhishek Pratap Singh; Kumar, Puneet; Li, Xiaobai; Raman, Balasubramanian

doi:10.5281/zenodo.15448846

Published June 1, 2025 | Version V1

Dataset Open

GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos

1. Indian Institute of Technology Roorkee
2. Indian Institute of Technology Ropar
3. Zhejiang University

Overview

We introduce the Group Affect from ViDeos (GAViD) dataset, which comprises 5091 video clips with multimodal data (video, audio, and context), annotated with ternary valence and discrete emotion labels, and enriched with VideoGPT-generated contextual metadata and human-annotated action cues. We also present CAGNet, a baseline model for multimodal context aware group affect recognition. CAGNet achieves 63.20% test accuracy on GAViD, comparable to state-of-the art performance in the field.

NOTE: For now we are providing only Train video clips. The corresponding paper is under Review in Transactions On Computational Social Systems (TCSS) journal. After its publication, the validation and Test set access will be granted upon request and approval, in accordance with the Responsible Use Policy.

Dataset Description

GAViD is a large-scale, in-the-wild multimodal dataset of 5091 samples, each annotated with the elements listed below. The following sections describe its key details and compilation procedure.

Raw video clips of an average duration of five seconds,
Audio aligned with the video clips,
Contextual metadata (scene descriptions, event labels) generated by a multimodal LLM and human-verified,
Group affect labels: ternary valence (positive, neutral, negative) and five discrete emotions (happy, sad, fear, anger, neutral),
Emotion intensity ratings (high, medium, low),
Interaction type labels (cooperative, hostile, neutral),
Action cues (e.g. smiling, clapping, shouting, dancing, singing).

Dataset details

Number of clips (samples) in GAViD-> 5130
Number of samples with some problem-> 39
Number of samples after filtering-> 5,091
Duration per clip-> 5 sec
Clip count per video-> 1–35
Dataset split-> Train: 3503; Val: 542; Test:1046
Affect labels (classwise distribution)-> Positive: 2600; Negative: 1189; Neutral: 1302
Emotion label distribution-> Neutral: 1522; Happy: 2428; Anger: 884; Sad: 201; Fear: 56

Keywords used to rearch the raw videos from YouTube

Positive	Positive	Negative	Negative	Neutral	Neutral
Team Celebration	Happy	Protest	Angry Sport	Group Meeting	Panel Discussion
Group Meeting	Video Conference	Heated Argument	Violent Protest	Parliament speech	People on street
Get Together	Meeting	Emotional breakdown in Public	Aggressive Argument	People walking on street	Team brainstorming Session
Celebration	Press Conference	Spritual Gathering	Aggressive Group	Team Building Activities	Group Discussion
Religious gathering	Talk Show	Street Race	Condolence	Group work session	Team Planning session
Farewell	Group Performance	Group Fight	Wrestling	Students in Discussion	Wedding Group Dance
People Dancing on Street	Street Comedy	MMA Fight	VIolence	Roundtable Discus- sion	Oath
Wedding Performance	Dhol masti	Boxing	Silent Protest	Mental health ad- dress	General Talk
Couple group dance	Comedy show	People in the fight	Group Fight	Wedding Celebration	Festival Celebration

Emotion Recognition Results using CAGNet

Model	Val Acc.	Val F1	Test Acc.	Test F1
CAGNet	63.55%	0.464	61.33%	0.458

Components of the Dataset

The dataset comprises two main components:

GAViD_train.csv file: Contains bin number used by labelbox in the annotation process, video_id, group_emotion (Positive, Negative, Neutral), specific_emotion (happy, sad, fear, anger, neutral), emotion_intensity, interaction_type, action_cuse, Video Description genertaed using Video-ChatGPT model.
GAViD_Train_VideoClips.zip folder: Contains the video clips of train set [For Now we are providing only Train video clips. Validation and Test set video clips will be provided as per the request].

Data Format and Fields of the CSV File

The dataset is structured in GAViD.csv file along with corresponding Videos in related folders. This CSV file includes the following fields:

Video_ID: Unique Identifier of a video
Group_Affect: Positive, Negative, Neutral
Descrete_Emotion: Happy, Sad, Fear, Anger, Neutral
Emotion_Intensity: High, Medium, Low
Interaction_Type: Cooperative, Hostile, Neutral
Action_Cues: e.g. Smiling, Clapping, Shouting, Dancing, Singing etc.
Context: Each video clip's summary generated from the Video-ChatGPT model.

Ethical considerations, data privacy and misuse prevention

Data Collection and Consent: The data collection and annotation strictly followed established ethical protocols in line with YouTube's Terms, which state “Public videos with a Creative Commons license may be reused". We downloaded only public-domain videos licensed under Creative Commons (CC BY 4.0), which “allows others to share, copy and redistribute the material in any medium or format, and to adapt, remix, transform, and build upon it for any purpose, even commercially".
Privacy: All content was reviewed to ensure no private or sensitive information is present. Faces are included only from public domain videos as needed for group affect research; only group-level content is released, with no attempt or risk of individual identification. Other personally identifiable information, such as
names and addresses and contacts, was removed.

Code and Citation

Code Repository: https: //github.com/deepakkumar-iitr/GAViD/tree/main
Citing the Dataset: Users of the dataset should cite the corresponding paper described at the above GitHub Repository.

License & Access

This dataset is released for academic research only and is free to researchers from educational or research institutes for non-commercial purposes.
Note that you are downloading this corpus at your own risk. No guarantee is provided, e.g. regarding the goodness of the corpus nor towards any subsequent effects. You may use it free of charge, and modify it as you wish, but clearly specify modifications if you pass modified material on.

Contact

Please send any questions about this dataset to:

Deepak Kumar (d_kumar@cs.iitr.ac.in),
Abhishek Singh (abhishek_s@cs.iitr.ac.in),
Puneet Kumar (puneet.kumar@iitrpr.ac.in),
Xiaobai Li (xiaobai.li@zju.edu.cn),
Balasubramanian Raman (bala@cs.iitr.ac.in).

NOTE: This dataset is released for academic research only and is free to researchers from educational or research institutions for non-commercial purposes.

Files

GAViD_train.csv

Files (1.7 GB)

Name	Size	Download all
GAViD_train.csv md5:f51e0ca3ba216d8b4c91edd7ff535a9a	1.0 MB	Preview Download
GAViD_Train_VideoClips.zip md5:2d10e72e0894b5f073ee177f5e3ad173	1.7 GB	Preview Download

Additional details

Other: Affective Computing, Group Affect, Emotion Understanding, Multi- modal Analysis, Human Computer Interaction, Behavior Analysis.

Other: 2025-06-01

GAViD dataset custom licence

Repository URL: https://github.com/deepakkumar-iitr/GAViD/tree/main
Programming language: Python

	All versions	This version
Views	167	167
Downloads	114	114
Data volume	61.5 GB	61.5 GB

Overview

Dataset Description

Keywords used to rearch the raw videos from YouTube

Emotion Recognition Results using CAGNet

Components of the Dataset

Data Format and Fields of the CSV File

Ethical considerations, data privacy and misuse prevention

Code and Citation

License & Access

Contact

NOTE: This dataset is released for academic research only and is free to researchers from educational or research institutions for non-commercial purposes.

GAViD_train.csv

Files (1.7 GB)

Identifiers

Dates

Software

GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos

Authors/Creators

Description

Overview

Dataset Description

Keywords used to rearch the raw videos from YouTube

Emotion Recognition Results using CAGNet

Components of the Dataset

Data Format and Fields of the CSV File

Ethical considerations, data privacy and misuse prevention

Code and Citation

License & Access

Contact

NOTE: This dataset is released for academic research only and is free to researchers from educational or research institutions for non-commercial purposes.

Files

GAViD_train.csv

Files (1.7 GB)

Additional details

Identifiers

Dates

Software