There is a newer version of the record available.

Published October 15, 2019 | Version 1.0
Dataset Open

Clotho dataset

  • 1. Audio Research Group, Faculty of Information Technology and Communication Sciences, Tampere University

Description

Clotho is a novel audio captioning dataset, consisting of 4981 audio samples, and each audio sample has five captions (a total of 24 905 captions). Audio samples are of 15 to 30 s duration and captions are eight to 20 words long. 

Clotho is thoroughly described in our paper:

K. Drossos, S. Lipping and T. Virtanen, "Clotho: an Audio Captioning Dataset," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736-740, doi: 10.1109/ICASSP40776.2020.9052990.

available online at: https://arxiv.org/abs/1910.09387 and at: https://ieeexplore.ieee.org/document/9052990 

If you use Clotho, please cite our paper.

 

To use the dataset, you can use our code at: https://github.com/audio-captioning/clotho-dataset 

 

These are the files for the development and evaluation splits of Clotho dataset. 

--------------------------------------------------------------------------------------------------------

== Usage ==

To use the dataset you have to:

  1. Download the audio files: clotho_audio_development.7z and clotho_audio_evalution.7z
  2. Download the files with the captions: clotho_captions_development.csv and clotho_captions_evaluation.csv
  3. Download the files with the associated metadata: clotho_metadata_development.csv and clotho_metadata_evaluation.csv
  4. Extract the audio files
  5. Then you can use each audio file with its corresponding captions

--------------------------------------------------------------------------------------------------------

== License ==

The audio files in the archives:

  • clotho_audio_development.7z and
  • clotho_audio_evalution.7z

and the associated meta-data in the CSV files:

  • clotho_metadata_development.csv
  • clotho_metadata_evaluation.csv

are under the corresponding licences (mostly CreativeCommons with attribution) of Freesound [1] platform, mentioned explicitly in the CSV files for each of the audio files. That is, each audio file in the 7z archives is listed in the CSV files with the meta-data. The meta-data for each file are: 

  • File name
  • Keywords
  • URL for the original audio file
  • Start and ending samples for the excerpt that is used in the Clotho dataset
  • Uploader/user in the Freesound platform (manufacturer)
  • Link to the licence of the file

The captions in the files:

  • clotho_captions_development.csv
  • clotho_captions_evaluation.csv

are under the Tampere University licence, described in the LICENCE file (mainly a non-commercial with attribution licence). 

--------------------------------------------------------------------------------------------------------

== References ==
[1] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245

Files

clotho_captions_development.csv

Files (4.7 GB)

Name Size Download all
md5:e3ce88561b317cc3825e8c861cae1ec6
3.4 GB Download
md5:4569624ccadf96223f19cb59fe4f849f
1.2 GB Download
md5:dd568352389f413d832add5cf604529f
1.0 MB Preview Download
md5:1b16b9e57cf7bdb7f13a13802aeb57e2
362.0 kB Preview Download
md5:582c18ee47cebdbe33dce1feeab53a56
624.8 kB Preview Download
md5:13946f054d4e1bf48079813aac61bf77
225.3 kB Preview Download
md5:afceb2583a64aea27ed4f14099126af0
1.9 kB Download

Additional details

Related works

Is supplement to
Conference paper: https://arxiv.org/abs/1910.09387 (URL)
Is supplemented by
Software: https://github.com/audio-captioning/clotho-dataset (URL)

Funding

EVERYSOUND – Computational Analysis of Everyday Soundscapes 637422
European Commission

References

  • Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245