Published June 1, 2025 | Version V1
Dataset Open

GAViD: A Large-Scale Multimodal Dataset for Context-Aware Group Affect Recognition from Videos

  • 1. ROR icon Indian Institute of Technology Roorkee
  • 2. ROR icon Indian Institute of Technology Ropar
  • 3. ROR icon Zhejiang University

Description

Overview

We introduce the Group Affect from ViDeos (GAViD) dataset, which comprises 5091 video clips with multimodal data (video, audio, and context), annotated with ternary valence and discrete emotion labels, and enriched with VideoGPT-generated contextual metadata and human-annotated action cues. We also present CAGNet, a baseline model for multimodal context aware group affect recognition. CAGNet achieves 63.20% test accuracy on GAViD, comparable to state-of-the art performance in the field. 

NOTE: For now we are providing only Train video clips. The corresponding paper is under Review in Transactions On Computational Social Systems (TCSS) journal. After its publication, the validation and Test set access will be granted upon request and approval, in accordance with the Responsible Use Policy.

Dataset Description

GAViD is a large-scale, in-the-wild multimodal dataset of 5091 samples, each annotated with the elements listed below. The following sections describe its key details and compilation procedure.

  1. Raw video clips of an average duration of five seconds,
  2. Audio aligned with the video clips,
  3. Contextual metadata (scene descriptions, event labels) generated by a multimodal LLM and human-verified,
  4. Group affect labels: ternary valence (positive, neutral, negative) and five discrete emotions (happy, sad, fear, anger, neutral),
  5. Emotion intensity ratings (high, medium, low),
  6. Interaction type labels (cooperative, hostile, neutral),
  7. Action cues (e.g. smiling, clapping, shouting, dancing, singing).

Dataset details

  • Number of clips (samples) in GAViD-> 5130
  • Number of samples with some problem-> 39
  • Number of samples after filtering-> 5,091
  • Duration per clip-> 5 sec
  • Clip count per video-> 1–35
  • Dataset split->  Train: 3503; Val: 542; Test:1046
  • Affect labels (classwise distribution)-> Positive: 2600; Negative: 1189; Neutral: 1302
  • Emotion label distribution-> Neutral: 1522; Happy: 2428; Anger: 884; Sad: 201; Fear: 56

Keywords used to rearch the raw videos from YouTube

Positive Positive Negative Negative Neutral Neutral
Team Celebration Happy Protest Angry Sport Group Meeting Panel Discussion
Group Meeting Video Conference Heated Argument Violent Protest Parliament speech People on street
Get Together Meeting Emotional breakdown in Public Aggressive Argument People walking on street Team brainstorming Session
Celebration Press Conference Spritual Gathering Aggressive Group Team Building Activities Group Discussion
Religious gathering  Talk Show  Street Race Condolence Group work session Team Planning session
Farewell Group Performance  Group Fight Wrestling Students in Discussion Wedding Group Dance
People Dancing on Street Street Comedy  MMA Fight VIolence Roundtable Discus-
sion
Oath
Wedding Performance Dhol masti  Boxing Silent Protest Mental health ad-
dress
General Talk
Couple group dance Comedy show People in the fight Group Fight Wedding Celebration Festival Celebration

Emotion Recognition Results using CAGNet

Model Val Acc. Val F1 Test Acc. Test F1
CAGNet 63.55% 0.464 61.33% 0.458

Components of the Dataset

The dataset comprises two main components:

  • GAViD_train.csv file: Contains bin number used by labelbox in the annotation process, video_id, group_emotion (Positive, Negative, Neutral), specific_emotion (happy, sad, fear, anger, neutral), emotion_intensity, interaction_type, action_cuse, Video Description genertaed using Video-ChatGPT model.
  • GAViD_Train_VideoClips.zip folder: Contains the video clips of train set [For Now we are providing only Train video clips. Validation and Test set video clips will be provided as per the request].

Data Format and Fields of the CSV File

The dataset is structured in GAViD.csv file along with corresponding Videos in related folders. This CSV file includes the following fields:

  • Video_ID: Unique Identifier of a video
  • Group_Affect: Positive, Negative, Neutral
  • Descrete_Emotion:  Happy, Sad, Fear, Anger, Neutral
  • Emotion_Intensity: High, Medium, Low
  • Interaction_Type: Cooperative, Hostile, Neutral
  • Action_Cues: e.g. Smiling, Clapping, Shouting, Dancing, Singing etc.
  • Context: Each video clip's summary generated from the Video-ChatGPT model.

Ethical considerations, data privacy and misuse prevention

  • Data Collection and Consent: The data collection and annotation strictly followed established ethical protocols in line with YouTube's Terms, which state “Public videos with a Creative Commons license may be reused". We downloaded only public-domain videos licensed under Creative Commons (CC BY 4.0), which “allows others to share, copy and redistribute the material in any medium or format, and to adapt, remix, transform, and build upon it for any purpose, even commercially".
  • Privacy: All content was reviewed to ensure no private or sensitive information is present. Faces are included only from public domain videos as needed for group affect research; only group-level content is released, with no attempt or risk of individual identification. Other personally identifiable information, such as
    names and addresses and contacts, was removed.

Code and Citation

  • Code Repository: https: //github.com/deepakkumar-iitr/GAViD/tree/main
  • Citing the Dataset: Users of the dataset should cite the corresponding paper described at the above GitHub Repository.

License & Access

  • This dataset is released for academic research only and is free to researchers from educational or research institutes for non-commercial purposes.
  • Note that you are downloading this corpus at your own risk. No guarantee is provided, e.g. regarding the goodness of the corpus nor towards any subsequent effects. You may use it free of charge, and modify it as you wish, but clearly specify modifications if you pass modified material on.

Contact

Please send any questions about this dataset to:

  • Deepak Kumar (d_kumar@cs.iitr.ac.in),
  • Abhishek Singh (abhishek_s@cs.iitr.ac.in),
  • Puneet Kumar (puneet.kumar@iitrpr.ac.in),
  • Xiaobai Li (xiaobai.li@zju.edu.cn),
  • Balasubramanian Raman (bala@cs.iitr.ac.in).

NOTE: This dataset is released for academic research only and is free to researchers from educational or research institutions for non-commercial purposes.

Files

GAViD_train.csv

Files (1.7 GB)

Name Size Download all
md5:f51e0ca3ba216d8b4c91edd7ff535a9a
1.0 MB Preview Download
md5:2d10e72e0894b5f073ee177f5e3ad173
1.7 GB Preview Download

Additional details

Identifiers

Other
Affective Computing, Group Affect, Emotion Understanding, Multi- modal Analysis, Human Computer Interaction, Behavior Analysis.

Dates

Other
2025-06-01
GAViD dataset custom licence

Software