Published October 31, 2023 | Version 1.0.0
Dataset Restricted

Multimodal Vision-Audio-Language Dataset

  • 1. ROR icon Goethe University Frankfurt
  • 2. The Hessian Center for Artificial Intelligence

Description

The Multimodal Vision-Audio-Language Dataset is a large-scale dataset for multimodal learning. It contains 2M video clips with corresponding audio and a textual description of the visual and auditory content. The dataset is an ensemble of existing datasets and fills the gap of missing modalities.

Details can be found in the attached report.

Annotation

The annotation files are provided as Parquet files. They can be read using Python and the pandas and pyarrow library.

The split into train, validation and test set follows the split of the original datasets.

Installation

pip install pandas pyarrow

Example

import pandas as pd
df = pd.read_parquet('annotation_train.parquet', engine='pyarrow')
print(df.iloc[0])

dataset                  AudioSet 

filename                train/---2_BBVHAA.mp3

captions_visual      [a man in a black hat and glasses.]

captions_auditory  [a man speaks and dishes clank.]

tags                       [Speech]

Description

The annotation file consists of the following fields:

filename: Name of the corresponding file (video or audio file)
dataset: Source dataset associated with the data point
captions_visual: A list of captions related to the visual content of the video. Can be NaN in case of no visual content
captions_auditory: A list of captions related to the auditory content of the video
tags: A list of tags, classifying the sound of a file. It can be NaN if no tags are provided

Data files

The raw data files for most datasets are not released due to licensing issues. They must be downloaded from the source. However, due to missing files, we provide them on request. Please contact us at schaumloeffel@em.uni-frankfurt.de

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/10060785">Log in</a> to check if you have access.