MeMAD_ Surrey dataset
Description
This dataset was developed as part of the European MeMAD project, which aimed to develop novel methods and models for managing and accessing digital audiovisual information in multiple languages and for various use contexts and audiences. The specific approach that the MeMAD project adopted was to combine advances in computer vision and machine learning with insights into human processing of multimodal content. Accordingly, Workpackage 5, Human processing in multimodal content description, aimed to: a) advance current understanding of the main principles, techniques and strategies of human-made video scene description by synthesising insights from previous research into human multimodal content description; b) use this understanding to identify differences and commonalities of human description of video content and machine-generated video captions, and to evaluate both types of description; and c) develop a human-based model of video scene description that is applicable to various usage situations. WP5’s findings can be used in developing content description services and technologies.
Funding acknowledgment
European Commission: MeMAD - Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy (780069)
The purpose of this dataset, the parties created it and the way it is structured
This dataset provides the original data that has been used in the above-mentioned part of the project. It consists of two corpora namely; Content Description (CD) and Machine Description (MD), created for 500 selected video clips from forty-five feature films. The CD corpus has been created by the research team in Surrey University in 2018/19 as one of the partners of the project. It represents a ‘ground truth’ summary of the action taking place on screen; constructed at a descriptive level only (without interpretation), it captures the scene as it would be superficially perceived by the average audience member. The MD corpus has been created by our partner, Aalto University computer vision team in 2019. The data was downloaded in vertical format from the SketchEngine, producing a text file including all metadata (e.g. film IDs, time codes, part of speech tags). The text file relating to each corpus has then been converted into XML format.
The dataset includes the following:
CD (tagged in XML format)
MD (tagged in XML format)
Film IDs and 6-digit codes for the clips used in the study (in CSV format)
The two corpora are aligned via film IDs and 6-digit codes relating to the extracts taken from the films. This information has been included in the dataset as a separate file in csv format.
The indications of the codes in the annotations:
The document-, paragraph- and sentence-level codes (e.g. The codes for the main structure are (<p> <s> and <align>) are elements of the SketchEngine notation (https://www.sketchengine.eu/guide/annotating-corpus-text/). The POS tags are from the English TreeTagger tagset used by the SketchEngine (https://www.sketchengine.eu/english-treetagger-pipeline-2/). The remaining codes are XML/TEI tags to encode the main characteristics of the texts (clip IDs, time codes, sound effects etc.; https://tei-c.org/guidelines/p5/).
Distribution Licence
The textual component of the MeMAD500 corpus is made available under CC BY License v4.0. The full text of this can be found at https://creativecommons.org/licenses/by/4.0/.
Contact point
For any queries relating this dataset, please contact the research team at Surrey University using the contact details below:
Prof Sabine Braun: s.braun@surrey.ac.uk
Dr Kim Starr: k.starr@surrey.ac.uk