Dataset Open Access
Konstantinos Drossos;
Samuel Lipping;
Tuomas Virtanen
Clotho is an audio captioning dataset, now reached version 2. Clotho consists of 6974 audio samples, and each audio sample has five captions (a total of 34 870 captions). Audio samples are of 15 to 30 s duration and captions are eight to 20 words long.
Clotho is thoroughly described in our paper:
K. Drossos, S. Lipping and T. Virtanen, "Clotho: an Audio Captioning Dataset," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736-740, doi: 10.1109/ICASSP40776.2020.9052990.
available online at: https://arxiv.org/abs/1910.09387 and at: https://ieeexplore.ieee.org/document/9052990
If you use Clotho, please cite our paper.
To use the dataset, you can use our code at: https://github.com/audio-captioning/clotho-dataset
These are the files for the development, validation, and evaluation splits of Clotho dataset.
--------------------------------------------------------------------------------------------------------
== Changes in version 2.1 ==
In version 2.1 of Clotho, we fixed some files that were corrupted from the compression and transferring processes (around 150 files) and we also replaced some characters that were illegal for most filesystems, e.g. ":" (around 10 files).
Please use this version for your experiments.
== Changes in version 2 ==
In version 2 of Clotho, there are audio files added in the development split and a new validation split is added. There are no changes in the evaluation split.
Specifically:
All new captions are treated as in version 1 of Clotho, i.e. having word consistency, no named entities, no speech transcription, and no hapax legomena between splits (i.e. words appearing only in one of the splits).
--------------------------------------------------------------------------------------------------------
== Usage ==
To use the dataset you have to:
--------------------------------------------------------------------------------------------------------
== License ==
The audio files in the archives:
and the associated meta-data in the CSV files:
are under the corresponding licences (mostly CreativeCommons with attribution) of Freesound [1] platform, mentioned explicitly in the CSV files for each of the audio files. That is, each audio file in the 7z archives is listed in the CSV files with the meta-data. The meta-data for each file are:
The captions in the files:
are under the Tampere University licence, described in the LICENCE file (mainly a non-commercial with attribution licence).
--------------------------------------------------------------------------------------------------------
== References ==
[1] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245
Name | Size | |
---|---|---|
clotho_audio_development.7z
md5:c8b05bc7acdb13895bb3c6a29608667e |
4.5 GB | Download |
clotho_audio_evaluation.7z
md5:4569624ccadf96223f19cb59fe4f849f |
1.2 GB | Download |
clotho_audio_validation.7z
md5:7dba730be08bada48bd15dc4e668df59 |
1.3 GB | Download |
clotho_captions_development.csv
md5:d4090b39ce9f2491908eebf4d5b09bae |
1.3 MB | Download |
clotho_captions_evaluation.csv
md5:1b16b9e57cf7bdb7f13a13802aeb57e2 |
362.0 kB | Download |
clotho_captions_validation.csv
md5:5879e023032b22a2c930aaa0528bead4 |
367.6 kB | Download |
clotho_metadata_development.csv
md5:170d20935ecfdf161ce1bb154118cda5 |
830.8 kB | Download |
clotho_metadata_evaluation.csv
md5:13946f054d4e1bf48079813aac61bf77 |
225.3 kB | Download |
clotho_metadata_validation.csv
md5:2e010427c56b1ce6008b0f03f41048ce |
224.8 kB | Download |
LICENSE
md5:38d422ac8a2c9c35c288232576e2e810 |
1.9 kB | Download |
Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245
All versions | This version | |
---|---|---|
Views | 17,700 | 3,476 |
Downloads | 29,565 | 10,772 |
Data volume | 27.0 TB | 13.1 TB |
Unique views | 15,010 | 2,943 |
Unique downloads | 13,864 | 3,857 |