Dataset for the paper "DialogueAV: a Dialogue-attended Audiovisual Dataset"
Authors/Creators
Description
Introduction
This is the official release of the code for DialogueAV: a Dialogue-attended Audiovisual Dataset. Dialogue-AV is a benchmarking dataset with ~258k video clips. Each clip has two dialogue-based descriptions: a Question-Answering Dialogue (QDA) with ten question-answer pairs and a simulated conversation between two "humans" discussing the video.
The dialogues come from human-created captions in SOTA benchmarking datasets and machine-generated captions. We use verified annotations from these top datasets, focusing solely on describing the audiovisual content.
Description
In the Dialogue-AV sample we present next, the input consists of a video containing an audio track along with its original text captions (1). The output is a series of dialogue turns that describe the video's content. We process the input video using audio and video captioners (2), which generate text descriptions corresponding to each modality. All captions, including the original, are transformed into dialogue (4) and question-answer (5) conversations that articulate the audiovisual content.https://github.com/lvilaca16/dialogue-av/blob/main/docs/figures/example_dialogue.png
Annotations in (4) and (5) undergo automatic validation (3) before they are accepted into Dialogue-AV. In the automatic validation step (3), accepted samples must:
- Include between 5 and 20 dialogue turns.
- Each dialogue turn must have at least one complete sentence. A complete sentence requires at least 1 subject, predicate, object or noun, and 1 verb; it should end with appropriate punctuation and begin with a named character. Additionally, each complete sentence must contain a minimum of 3 words after removing punctuation (avoid simple sentences as "It rains.").
- Avoid using the terms "caption(s)" or "dialogue(s)", thereby eliminating references to the original prompt.
For more details about the data generation process, we refer the reader to the (to be published) manuscript.
Correspondence and Maintenance
For details about the implementation, generation and usage, please check the official GitHub page.If you observed any issues, please contact us. All project-related issues and feature requests should be submitted through our GitHub Issues page.
Files
Files
(61.8 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:5c2b05f1f682d06935d8f05e633058e5
|
61.5 GB | Download |
|
md5:0702ed4f43803eb57fd14d4be2043318
|
16.9 MB | Download |
|
md5:1701920952c900196d337e2d1d71360b
|
27.9 MB | Download |
|
md5:2794b4635c3a1a76b9e49c9f494f898b
|
107.1 MB | Download |
|
md5:5638701fd044f9512109386f74b43e1e
|
81.4 MB | Download |
|
md5:dff7caf23c3ee0c3ac47278ae42b2957
|
11.9 MB | Download |
|
md5:500fcd40223914ef838273bbaa2cf730
|
8.7 MB | Download |
Additional details
Funding
- Fundação para a Ciência e Tecnologia
- PhD Scholarship 2022.11905.BD
Software
- Repository URL
- https://github.com/lvilaca16/dialogue-av
- Programming language
- Python
- Development Status
- Active