There is a newer version of this record available.

Dataset Open Access

Clotho dataset

Konstantinos Drossos; Samuel Lipping; Tuomas Virtanen


JSON-LD (schema.org) Export

{
  "inLanguage": {
    "alternateName": "eng", 
    "@type": "Language", 
    "name": "English"
  }, 
  "description": "<p>Clotho is a novel audio captioning dataset, consisting of&nbsp;4981 audio samples, and each audio sample has five captions (a total of&nbsp;24 905 captions).&nbsp;Audio samples are of 15 to 30 s duration and captions are eight to 20 words long.&nbsp;</p>\n\n<p>Clotho is thoroughly described in our paper:</p>\n\n<p><em>K. Drossos, S. Lipping and T. Virtanen, &quot;Clotho: an Audio Captioning Dataset,&quot; IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736-740, doi: 10.1109/ICASSP40776.2020.9052990.</em></p>\n\n<p>available online at:&nbsp;<a href=\"https://arxiv.org/abs/1910.09387\">https://arxiv.org/abs/1910.09387</a>&nbsp;and at:&nbsp;<a href=\"https://ieeexplore.ieee.org/document/9052990\">https://ieeexplore.ieee.org/document/9052990</a>&nbsp;</p>\n\n<p><strong>If you use Clotho, please cite our paper.</strong></p>\n\n<p>&nbsp;</p>\n\n<p><strong>To use the dataset, you can use our code at:</strong>&nbsp;<a href=\"https://github.com/audio-captioning/clotho-dataset\">https://github.com/audio-captioning/clotho-dataset</a>&nbsp;</p>\n\n<p>&nbsp;</p>\n\n<p>These are the files for the development and evaluation splits of Clotho dataset.&nbsp;</p>\n\n<p>--------------------------------------------------------------------------------------------------------</p>\n\n<p><strong>== Usage ==</strong></p>\n\n<p>To use the dataset you have to:</p>\n\n<ol>\n\t<li>Download the audio files:&nbsp;clotho_audio_development.7z and&nbsp;clotho_audio_evalution.7z</li>\n\t<li>Download the files with the captions:&nbsp;clotho_captions_development.csv and&nbsp;clotho_captions_evaluation.csv</li>\n\t<li>Download the files with the associated metadata:&nbsp;clotho_metadata_development.csv and&nbsp;clotho_metadata_evaluation.csv</li>\n\t<li>Extract the audio files</li>\n\t<li>Then you can use each audio file with its corresponding captions</li>\n</ol>\n\n<p>--------------------------------------------------------------------------------------------------------</p>\n\n<p><strong>== License&nbsp;==</strong></p>\n\n<p>The audio files in the archives:</p>\n\n<ul>\n\t<li>clotho_audio_development.7z and</li>\n\t<li>clotho_audio_evalution.7z</li>\n</ul>\n\n<p>and the associated meta-data in the CSV files:</p>\n\n<ul>\n\t<li>clotho_metadata_development.csv</li>\n\t<li>clotho_metadata_evaluation.csv</li>\n</ul>\n\n<p>are under the corresponding licences (<strong><em>mostly CreativeCommons with attribution</em></strong>) of Freesound [1] platform, mentioned explicitly in the CSV files for each of the audio files. That is, each audio file in the 7z archives is listed in the CSV files with the meta-data. The meta-data for each file are:&nbsp;</p>\n\n<ul>\n\t<li>File name</li>\n\t<li>Keywords</li>\n\t<li>URL for the original audio file</li>\n\t<li>Start and ending samples for the excerpt that is used in the Clotho dataset</li>\n\t<li>Uploader/user in the Freesound platform&nbsp;(manufacturer)</li>\n\t<li>Link to the licence of the file</li>\n</ul>\n\n<p>The captions in the files:</p>\n\n<ul>\n\t<li>clotho_captions_development.csv</li>\n\t<li>clotho_captions_evaluation.csv</li>\n</ul>\n\n<p>are under the Tampere University licence, described in the LICENCE file (<strong><em>mainly a non-commercial with attribution licence</em></strong>).&nbsp;</p>\n\n<p>--------------------------------------------------------------------------------------------------------</p>\n\n<p><strong>== References ==</strong><br>\n[1]&nbsp;Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM &#39;13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245</p>", 
  "license": "", 
  "creator": [
    {
      "affiliation": "Audio Research Group, Faculty of Information Technology and Communication Sciences, Tampere University", 
      "@id": "https://orcid.org/0000-0002-3605-7127", 
      "@type": "Person", 
      "name": "Konstantinos Drossos"
    }, 
    {
      "affiliation": "Audio Research Group, Faculty of Information Technology and Communication Sciences, Tampere University", 
      "@type": "Person", 
      "name": "Samuel Lipping"
    }, 
    {
      "affiliation": "Audio Research Group, Faculty of Information Technology and Communication Sciences, Tampere University", 
      "@id": "https://orcid.org/0000-0002-4604-9729", 
      "@type": "Person", 
      "name": "Tuomas Virtanen"
    }
  ], 
  "url": "https://zenodo.org/record/3490684", 
  "datePublished": "2019-10-15", 
  "version": "1.0", 
  "keywords": [
    "Clotho", 
    "Audio captioning", 
    "Dataset", 
    "Audio processing", 
    "Signal processing", 
    "Machine listening", 
    "Computational auditory scene analysis", 
    "Captioning", 
    "Deep learning"
  ], 
  "@context": "https://schema.org/", 
  "distribution": [
    {
      "contentUrl": "https://zenodo.org/api/files/7ae3c16e-5963-4b32-9ef7-83c0daf28229/clotho_audio_development.7z", 
      "encodingFormat": "7z", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/7ae3c16e-5963-4b32-9ef7-83c0daf28229/clotho_audio_evaluation.7z", 
      "encodingFormat": "7z", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/7ae3c16e-5963-4b32-9ef7-83c0daf28229/clotho_captions_development.csv", 
      "encodingFormat": "csv", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/7ae3c16e-5963-4b32-9ef7-83c0daf28229/clotho_captions_evaluation.csv", 
      "encodingFormat": "csv", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/7ae3c16e-5963-4b32-9ef7-83c0daf28229/clotho_metadata_development.csv", 
      "encodingFormat": "csv", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/7ae3c16e-5963-4b32-9ef7-83c0daf28229/clotho_metadata_evaluation.csv", 
      "encodingFormat": "csv", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/7ae3c16e-5963-4b32-9ef7-83c0daf28229/LICENSE", 
      "encodingFormat": "", 
      "@type": "DataDownload"
    }
  ], 
  "identifier": "https://doi.org/10.5281/zenodo.3490684", 
  "@id": "https://doi.org/10.5281/zenodo.3490684", 
  "@type": "Dataset", 
  "name": "Clotho dataset"
}
10,248
14,596
views
downloads
All versions This version
Views 10,2488,960
Downloads 14,59612,010
Data volume 11.5 TB9.0 TB
Unique views 9,0418,040
Unique downloads 7,7006,794

Share

Cite as