There is a newer version of this record available.

Dataset Open Access

Clotho dataset

Konstantinos Drossos; Samuel Lipping; Tuomas Virtanen


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.3490684", 
  "language": "eng", 
  "title": "Clotho dataset", 
  "issued": {
    "date-parts": [
      [
        2019, 
        10, 
        15
      ]
    ]
  }, 
  "abstract": "<p>Clotho is a novel audio captioning dataset, consisting of&nbsp;4981 audio samples, and each audio sample has five captions (a total of&nbsp;24 905 captions).&nbsp;Audio samples are of 15 to 30 s duration and captions are eight to 20 words long.&nbsp;</p>\n\n<p>Clotho is thoroughly described in our paper:</p>\n\n<p><em>K. Drossos, S. Lipping and T. Virtanen, &quot;Clotho: an Audio Captioning Dataset,&quot; IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736-740, doi: 10.1109/ICASSP40776.2020.9052990.</em></p>\n\n<p>available online at:&nbsp;<a href=\"https://arxiv.org/abs/1910.09387\">https://arxiv.org/abs/1910.09387</a>&nbsp;and at:&nbsp;<a href=\"https://ieeexplore.ieee.org/document/9052990\">https://ieeexplore.ieee.org/document/9052990</a>&nbsp;</p>\n\n<p><strong>If you use Clotho, please cite our paper.</strong></p>\n\n<p>&nbsp;</p>\n\n<p><strong>To use the dataset, you can use our code at:</strong>&nbsp;<a href=\"https://github.com/audio-captioning/clotho-dataset\">https://github.com/audio-captioning/clotho-dataset</a>&nbsp;</p>\n\n<p>&nbsp;</p>\n\n<p>These are the files for the development and evaluation splits of Clotho dataset.&nbsp;</p>\n\n<p>--------------------------------------------------------------------------------------------------------</p>\n\n<p><strong>== Usage ==</strong></p>\n\n<p>To use the dataset you have to:</p>\n\n<ol>\n\t<li>Download the audio files:&nbsp;clotho_audio_development.7z and&nbsp;clotho_audio_evalution.7z</li>\n\t<li>Download the files with the captions:&nbsp;clotho_captions_development.csv and&nbsp;clotho_captions_evaluation.csv</li>\n\t<li>Download the files with the associated metadata:&nbsp;clotho_metadata_development.csv and&nbsp;clotho_metadata_evaluation.csv</li>\n\t<li>Extract the audio files</li>\n\t<li>Then you can use each audio file with its corresponding captions</li>\n</ol>\n\n<p>--------------------------------------------------------------------------------------------------------</p>\n\n<p><strong>== License&nbsp;==</strong></p>\n\n<p>The audio files in the archives:</p>\n\n<ul>\n\t<li>clotho_audio_development.7z and</li>\n\t<li>clotho_audio_evalution.7z</li>\n</ul>\n\n<p>and the associated meta-data in the CSV files:</p>\n\n<ul>\n\t<li>clotho_metadata_development.csv</li>\n\t<li>clotho_metadata_evaluation.csv</li>\n</ul>\n\n<p>are under the corresponding licences (<strong><em>mostly CreativeCommons with attribution</em></strong>) of Freesound [1] platform, mentioned explicitly in the CSV files for each of the audio files. That is, each audio file in the 7z archives is listed in the CSV files with the meta-data. The meta-data for each file are:&nbsp;</p>\n\n<ul>\n\t<li>File name</li>\n\t<li>Keywords</li>\n\t<li>URL for the original audio file</li>\n\t<li>Start and ending samples for the excerpt that is used in the Clotho dataset</li>\n\t<li>Uploader/user in the Freesound platform&nbsp;(manufacturer)</li>\n\t<li>Link to the licence of the file</li>\n</ul>\n\n<p>The captions in the files:</p>\n\n<ul>\n\t<li>clotho_captions_development.csv</li>\n\t<li>clotho_captions_evaluation.csv</li>\n</ul>\n\n<p>are under the Tampere University licence, described in the LICENCE file (<strong><em>mainly a non-commercial with attribution licence</em></strong>).&nbsp;</p>\n\n<p>--------------------------------------------------------------------------------------------------------</p>\n\n<p><strong>== References ==</strong><br>\n[1]&nbsp;Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM &#39;13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245</p>", 
  "author": [
    {
      "family": "Konstantinos Drossos"
    }, 
    {
      "family": "Samuel Lipping"
    }, 
    {
      "family": "Tuomas Virtanen"
    }
  ], 
  "version": "1.0", 
  "type": "dataset", 
  "id": "3490684"
}
10,180
14,494
views
downloads
All versions This version
Views 10,1808,917
Downloads 14,49411,978
Data volume 11.4 TB8.9 TB
Unique views 8,9828,002
Unique downloads 7,6506,773

Share

Cite as