Published August 16, 2023 | Version 1
Dataset Open

Audiovisual Moments in Time: A Large-Scale Annotated Dataset of Audiovisual Actions

  • 1. University of Birmingham

Description

We present Audiovisual Moments in Time (AVMIT), a large-scale dataset of audiovisual action events. In an extensive annotation task 11 participants labelled a subset of 3-second audiovisual videos from the Moments in Time dataset (MIT). For each trial, participants assessed whether the labelled audiovisual action event was present and whether it was the most prominent feature of the video. The dataset includes the annotation of 57,177 audiovisual videos, each independently evaluated by 3 of 11 trained participants. From this initial collection, we created a curated test set of 16 distinct action classes, with 60 videos each (960 videos). We also offer 2 sets of pre-computed audiovisual feature embeddings, using VGGish/YamNet for audio data and VGG16/EfficientNetB0 for visual data, thereby lowering the barrier to entry for audiovisual DNN research. We further carried out an experiment to explore the utility of the AVMIT annotations and feature embeddings. A series of 6 Recurrent Neural Networks (RNNs) were trained on either AVMIT-filtered audiovisual events or modality-agnostic events from MIT, and then tested on our audiovisual test set. In all RNNs, top 1 accuracy was increased by 2.71-5.94\% by training exclusively on audiovisual events, even outweighing a three-fold increase in training data. We anticipate that the newly annotated AVMIT dataset will serve as a valuable resource for research and comparative experiments involving computational models and human participants, specifically when addressing research questions where audiovisual correspondence is of critical importance.

Files

test_set.csv

Files (2.6 GB)

Name Size Download all
md5:11c523793f702c98c41fb5b30a6d7fee
498.7 MB Download
md5:45c51034fb418eb0bbfdae2d17afb29b
2.1 GB Download
md5:b78453da858755f889cadcf9b0300e22
121.1 kB Preview Download
md5:9817e7adaaa8aa7d2384823b9d618e7e
6.9 MB Preview Download