Clotho-AQA dataset
- 1. Audio Research Group, Tampere University
Description
Clotho-AQA is an audio question-answering dataset consisting of 1991 audio samples taken from Clotho dataset [1]. Each audio sample has 6 associated questions collected through crowdsourcing. For each question, the answers are provided by three different annotators making a total of 35,838 question-answer pairs. For each audio sample, 4 questions are designed to be answered with 'yes' or 'no', while the remaining two questions are designed to be answered in a single word. More details about the data collection process and data splitting process can be found in our following paper.
S. Lipping, P. Sudarsanam, K. Drossos, T. Virtanen ‘Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering.’ The paper is available online at 2204.09634.pdf (arxiv.org)
If you use the Clotho-AQA dataset, please cite the paper mentioned above. A sample baseline model to use the Clotho-AQA dataset can be found at partha2409/AquaNet (github.com)
To use the dataset,
• Download and extract ‘audio_files.zip’. This contains all the 1991 audio samples in the dataset.
• Download ‘clotho_aqa_train.csv’, ‘clotho_aqa_val.csv’, and ‘clotho_aqa_test.csv’. These files contain the train, validation, and test splits, respectively. They contain the audio file name, questions, answers, and confidence scores provided by the annotators.
License:
The audio files in the archive ‘audio_files.zip’ are under the corresponding licenses (mostly CreativeCommons with attribution) of Freesound [2] platform, mentioned explicitly in the CSV file ’clotho_aqa_metadata.csv’ for each of the audio files. That is, each audio file in the archive is listed in the CSV file with meta-data. The meta-data for each file are:
• File name
• Keywords
• URL for the original audio file
• Start and ending samples for the excerpt that is used in the Clotho dataset
• Uploader/user in the Freesound platform (manufacturer)
• Link to the license of the file.
The questions and answers in the files:
• clotho_aqa_train.csv
• clotho_aqa_val.csv
• clotho_aqa_test.csv
are under the MIT license, described in the LICENSE file.
References:
[1] K. Drossos, S. Lipping and T. Virtanen, "Clotho: An Audio Captioning Dataset," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736- 740, doi: 10.1109/ICASSP40776.2020.9052990.
[2] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245
Files
audio_files.zip
Files
(3.1 GB)
Name | Size | Download all |
---|---|---|
md5:48c2c46ccd71dc06121228e2ed68d6b9
|
3.1 GB | Preview Download |
md5:e555d8a5a4b275ca7c437a33c0739d65
|
427.4 kB | Preview Download |
md5:6fd0e75dffba561805fa37c668ce99a4
|
586.7 kB | Preview Download |
md5:508513b91001ab17e641eaaa162ca045
|
1.5 MB | Preview Download |
md5:41b8f6d906a57d7a5a4ee02bbcdefcc9
|
428.2 kB | Preview Download |
md5:41460130242fbd61fa2615a0e6622d3a
|
1.5 kB | Preview Download |