Dataset Open Access

Clotho dataset

Konstantinos Drossos; Samuel Lipping; Tuomas Virtanen

Clotho is a novel audio captioning dataset, consisting of 4981 audio samples, and each audio sample has five captions (a total of 24 905 captions). Audio samples are of 15 to 30 s duration and captions are eight to 20 words long. 

Clotho is thoroughly described in our paper:

K. Drossos, S. Lipping and T. Virtanen, "Clotho: an Audio Captioning Dataset," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736-740, doi: 10.1109/ICASSP40776.2020.9052990.

available online at: https://arxiv.org/abs/1910.09387 and at: https://ieeexplore.ieee.org/document/9052990 

If you use Clotho, please cite our paper.

 

To use the dataset, you can use our code at: https://github.com/audio-captioning/clotho-dataset 

 

These are the files for the development and evaluation splits of Clotho dataset. 

--------------------------------------------------------------------------------------------------------

== Usage ==

To use the dataset you have to:

  1. Download the audio files: clotho_audio_development.7z and clotho_audio_evalution.7z
  2. Download the files with the captions: clotho_captions_development.csv and clotho_captions_evaluation.csv
  3. Download the files with the associated metadata: clotho_metadata_development.csv and clotho_metadata_evaluation.csv
  4. Extract the audio files
  5. Then you can use each audio file with its corresponding captions

--------------------------------------------------------------------------------------------------------

== License ==

The audio files in the archives:

  • clotho_audio_development.7z and
  • clotho_audio_evalution.7z

and the associated meta-data in the CSV files:

  • clotho_metadata_development.csv
  • clotho_metadata_evaluation.csv

are under the corresponding licences (mostly CreativeCommons with attribution) of Freesound [1] platform, mentioned explicitly in the CSV files for each of the audio files. That is, each audio file in the 7z archives is listed in the CSV files with the meta-data. The meta-data for each file are: 

  • File name
  • Keywords
  • URL for the original audio file
  • Start and ending samples for the excerpt that is used in the Clotho dataset
  • Uploader/user in the Freesound platform (manufacturer)
  • Link to the licence of the file

The captions in the files:

  • clotho_captions_development.csv
  • clotho_captions_evaluation.csv

are under the Tampere University licence, described in the LICENCE file (mainly a non-commercial with attribution licence). 

--------------------------------------------------------------------------------------------------------

== References ==
[1] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245

Files (4.7 GB)
Name Size
clotho_audio_development.7z
md5:e3ce88561b317cc3825e8c861cae1ec6
3.4 GB Download
clotho_audio_evaluation.7z
md5:4569624ccadf96223f19cb59fe4f849f
1.2 GB Download
clotho_captions_development.csv
md5:dd568352389f413d832add5cf604529f
1.0 MB Download
clotho_captions_evaluation.csv
md5:1b16b9e57cf7bdb7f13a13802aeb57e2
362.0 kB Download
clotho_metadata_development.csv
md5:582c18ee47cebdbe33dce1feeab53a56
624.8 kB Download
clotho_metadata_evaluation.csv
md5:13946f054d4e1bf48079813aac61bf77
225.3 kB Download
LICENSE
md5:afceb2583a64aea27ed4f14099126af0
1.9 kB Download
  • Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245

4,145
4,976
views
downloads
All versions This version
Views 4,1454,145
Downloads 4,9764,973
Data volume 4.1 TB4.1 TB
Unique views 3,7703,770
Unique downloads 2,9362,934

Share

Cite as