Clotho dataset
- 1. Audio Research Group, Faculty of Information Technology and Communication Sciences, Tampere University
Description
Clotho is a novel audio captioning dataset, consisting of 4981 audio samples, and each audio sample has five captions (a total of 24 905 captions). Audio samples are of 15 to 30 s duration and captions are eight to 20 words long.
Clotho is thoroughly described in our paper:
K. Drossos, S. Lipping and T. Virtanen, "Clotho: an Audio Captioning Dataset," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736-740, doi: 10.1109/ICASSP40776.2020.9052990.
available online at: https://arxiv.org/abs/1910.09387 and at: https://ieeexplore.ieee.org/document/9052990
If you use Clotho, please cite our paper.
To use the dataset, you can use our code at: https://github.com/audio-captioning/clotho-dataset
These are the files for the development and evaluation splits of Clotho dataset.
--------------------------------------------------------------------------------------------------------
== Usage ==
To use the dataset you have to:
- Download the audio files: clotho_audio_development.7z and clotho_audio_evalution.7z
- Download the files with the captions: clotho_captions_development.csv and clotho_captions_evaluation.csv
- Download the files with the associated metadata: clotho_metadata_development.csv and clotho_metadata_evaluation.csv
- Extract the audio files
- Then you can use each audio file with its corresponding captions
--------------------------------------------------------------------------------------------------------
== License ==
The audio files in the archives:
- clotho_audio_development.7z and
- clotho_audio_evalution.7z
and the associated meta-data in the CSV files:
- clotho_metadata_development.csv
- clotho_metadata_evaluation.csv
are under the corresponding licences (mostly CreativeCommons with attribution) of Freesound [1] platform, mentioned explicitly in the CSV files for each of the audio files. That is, each audio file in the 7z archives is listed in the CSV files with the meta-data. The meta-data for each file are:
- File name
- Keywords
- URL for the original audio file
- Start and ending samples for the excerpt that is used in the Clotho dataset
- Uploader/user in the Freesound platform (manufacturer)
- Link to the licence of the file
The captions in the files:
- clotho_captions_development.csv
- clotho_captions_evaluation.csv
are under the Tampere University licence, described in the LICENCE file (mainly a non-commercial with attribution licence).
--------------------------------------------------------------------------------------------------------
== References ==
[1] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245
Files
clotho_captions_development.csv
Files
(4.7 GB)
Name | Size | Download all |
---|---|---|
md5:e3ce88561b317cc3825e8c861cae1ec6
|
3.4 GB | Download |
md5:4569624ccadf96223f19cb59fe4f849f
|
1.2 GB | Download |
md5:dd568352389f413d832add5cf604529f
|
1.0 MB | Preview Download |
md5:1b16b9e57cf7bdb7f13a13802aeb57e2
|
362.0 kB | Preview Download |
md5:582c18ee47cebdbe33dce1feeab53a56
|
624.8 kB | Preview Download |
md5:13946f054d4e1bf48079813aac61bf77
|
225.3 kB | Preview Download |
md5:afceb2583a64aea27ed4f14099126af0
|
1.9 kB | Download |
Additional details
Related works
- Is supplement to
- Conference paper: https://arxiv.org/abs/1910.09387 (URL)
- Is supplemented by
- Software: https://github.com/audio-captioning/clotho-dataset (URL)
References
- Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245