Published May 29, 2020 | Version 1.0
Dataset Open

Audio captioning DCASE 2020 evaluation (testing) split

  • 1. Audio Research Group, Faculty of Information Technology and Communication Sciences, Tampere University

Description

This is the evaluation split for Task 6, Automated Audio Captioning, in DCASE 2020 Challenge

This evaluation split is the Clotho testing split, which is thoroughly described in the corresponding paper: 

K. Drossos, S. Lipping and T. Virtanen, "Clotho: an Audio Captioning Dataset," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736-740, doi: 10.1109/ICASSP40776.2020.9052990.

available online at: https://arxiv.org/abs/1910.09387 and at: https://ieeexplore.ieee.org/document/9052990 

This evaluation split is meant to be used for the purposes of the Task 6 at the scientific challenge DCASE 2020. This split it is not meant to be used for developing audio captioning methods. For developing audio captioning methods, you should use the development and evaluation splits of Clotho. 

If you want the development and evaluation splits of Clotho dataset, you can find them also in Zenodo, at: https://zenodo.org/record/3490684

--------------------------------------------------------------------------------------------------------

== License ==

The audio files in the archives:

  • clotho_audio_test.7z 

and the associated meta-data in the CSV file:

  • clotho_metadata_test.csv

are under the corresponding licences (mostly CreativeCommons with attribution) of Freesound [1] platform, mentioned explicitly in the CSV file for each of the audio files. That is, each audio file in the 7z archive is listed in the CSV file with the meta-data. The meta-data for each file are: 

  • File name
  • Start and ending samples for the excerpt that is used in the Clotho dataset
  • Uploader/user in the Freesound platform (manufacturer)
  • Link to the licence of the file

--------------------------------------------------------------------------------------------------------

== References ==
[1] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245

Files

clotho_metadata_test.csv

Files (1.3 GB)

Name Size Download all
md5:9b3fe72560a621641ff4351ba1154349
1.3 GB Download
md5:52f8ad01c229a310a0ff8043df480e21
89.2 kB Preview Download

Additional details

Funding

EVERYSOUND – Computational Analysis of Everyday Soundscapes 637422
European Commission