There is a newer version of this record available.

Dataset Open Access

Clotho dataset

Konstantinos Drossos; Samuel Lipping; Tuomas Virtanen


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Konstantinos Drossos</dc:creator>
  <dc:creator>Samuel Lipping</dc:creator>
  <dc:creator>Tuomas Virtanen</dc:creator>
  <dc:date>2019-10-15</dc:date>
  <dc:description>Clotho is a novel audio captioning dataset, consisting of 4981 audio samples, and each audio sample has five captions (a total of 24 905 captions). Audio samples are of 15 to 30 s duration and captions are eight to 20 words long. 

Clotho is thoroughly described in our paper:

K. Drossos, S. Lipping and T. Virtanen, "Clotho: an Audio Captioning Dataset," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736-740, doi: 10.1109/ICASSP40776.2020.9052990.

available online at: https://arxiv.org/abs/1910.09387 and at: https://ieeexplore.ieee.org/document/9052990 

If you use Clotho, please cite our paper.

 

To use the dataset, you can use our code at: https://github.com/audio-captioning/clotho-dataset 

 

These are the files for the development and evaluation splits of Clotho dataset. 

--------------------------------------------------------------------------------------------------------

== Usage ==

To use the dataset you have to:


	Download the audio files: clotho_audio_development.7z and clotho_audio_evalution.7z
	Download the files with the captions: clotho_captions_development.csv and clotho_captions_evaluation.csv
	Download the files with the associated metadata: clotho_metadata_development.csv and clotho_metadata_evaluation.csv
	Extract the audio files
	Then you can use each audio file with its corresponding captions


--------------------------------------------------------------------------------------------------------

== License ==

The audio files in the archives:


	clotho_audio_development.7z and
	clotho_audio_evalution.7z


and the associated meta-data in the CSV files:


	clotho_metadata_development.csv
	clotho_metadata_evaluation.csv


are under the corresponding licences (mostly CreativeCommons with attribution) of Freesound [1] platform, mentioned explicitly in the CSV files for each of the audio files. That is, each audio file in the 7z archives is listed in the CSV files with the meta-data. The meta-data for each file are: 


	File name
	Keywords
	URL for the original audio file
	Start and ending samples for the excerpt that is used in the Clotho dataset
	Uploader/user in the Freesound platform (manufacturer)
	Link to the licence of the file


The captions in the files:


	clotho_captions_development.csv
	clotho_captions_evaluation.csv


are under the Tampere University licence, described in the LICENCE file (mainly a non-commercial with attribution licence). 

--------------------------------------------------------------------------------------------------------

== References ==
[1] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245</dc:description>
  <dc:identifier>https://zenodo.org/record/3490684</dc:identifier>
  <dc:identifier>10.5281/zenodo.3490684</dc:identifier>
  <dc:identifier>oai:zenodo.org:3490684</dc:identifier>
  <dc:language>eng</dc:language>
  <dc:relation>info:eu-repo/grantAgreement/EC/H2020/637422/</dc:relation>
  <dc:relation>url:https://github.com/audio-captioning/clotho-dataset</dc:relation>
  <dc:relation>url:https://arxiv.org/abs/1910.09387</dc:relation>
  <dc:relation>doi:10.5281/zenodo.3490683</dc:relation>
  <dc:relation>url:https://zenodo.org/communities/audio-captioning</dc:relation>
  <dc:relation>url:https://zenodo.org/communities/tut-arg</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:subject>Clotho</dc:subject>
  <dc:subject>Audio captioning</dc:subject>
  <dc:subject>Dataset</dc:subject>
  <dc:subject>Audio processing</dc:subject>
  <dc:subject>Signal processing</dc:subject>
  <dc:subject>Machine listening</dc:subject>
  <dc:subject>Computational auditory scene analysis</dc:subject>
  <dc:subject>Captioning</dc:subject>
  <dc:subject>Deep learning</dc:subject>
  <dc:title>Clotho dataset</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>dataset</dc:type>
</oai_dc:dc>
10,248
14,596
views
downloads
All versions This version
Views 10,2488,960
Downloads 14,59612,010
Data volume 11.5 TB9.0 TB
Unique views 9,0418,040
Unique downloads 7,7006,794

Share

Cite as