There is a newer version of this record available.

Dataset Open Access

Clotho dataset

Konstantinos Drossos; Samuel Lipping; Tuomas Virtanen

Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="" xmlns:oai_dc="" xmlns:xsi="" xsi:schemaLocation="">
  <dc:creator>Konstantinos Drossos</dc:creator>
  <dc:creator>Samuel Lipping</dc:creator>
  <dc:creator>Tuomas Virtanen</dc:creator>
  <dc:description>Clotho is a novel audio captioning dataset, consisting of 4981 audio samples, and each audio sample has five captions (a total of 24 905 captions). Audio samples are of 15 to 30 s duration and captions are eight to 20 words long. 

Clotho is thoroughly described in our paper:

K. Drossos, S. Lipping and T. Virtanen, "Clotho: an Audio Captioning Dataset," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736-740, doi: 10.1109/ICASSP40776.2020.9052990.

available online at: and at: 

If you use Clotho, please cite our paper.


To use the dataset, you can use our code at: 


These are the files for the development and evaluation splits of Clotho dataset. 


== Usage ==

To use the dataset you have to:

	Download the audio files: clotho_audio_development.7z and clotho_audio_evalution.7z
	Download the files with the captions: clotho_captions_development.csv and clotho_captions_evaluation.csv
	Download the files with the associated metadata: clotho_metadata_development.csv and clotho_metadata_evaluation.csv
	Extract the audio files
	Then you can use each audio file with its corresponding captions


== License ==

The audio files in the archives:

	clotho_audio_development.7z and

and the associated meta-data in the CSV files:


are under the corresponding licences (mostly CreativeCommons with attribution) of Freesound [1] platform, mentioned explicitly in the CSV files for each of the audio files. That is, each audio file in the 7z archives is listed in the CSV files with the meta-data. The meta-data for each file are: 

	File name
	URL for the original audio file
	Start and ending samples for the excerpt that is used in the Clotho dataset
	Uploader/user in the Freesound platform (manufacturer)
	Link to the licence of the file

The captions in the files:


are under the Tampere University licence, described in the LICENCE file (mainly a non-commercial with attribution licence). 


== References ==
[1] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI:</dc:description>
  <dc:subject>Audio captioning</dc:subject>
  <dc:subject>Audio processing</dc:subject>
  <dc:subject>Signal processing</dc:subject>
  <dc:subject>Machine listening</dc:subject>
  <dc:subject>Computational auditory scene analysis</dc:subject>
  <dc:subject>Deep learning</dc:subject>
  <dc:title>Clotho dataset</dc:title>
All versions This version
Views 10,2488,960
Downloads 14,59612,010
Data volume 11.5 TB9.0 TB
Unique views 9,0418,040
Unique downloads 7,7006,794


Cite as