There is a newer version of this record available.

Dataset Open Access

Clotho dataset

Konstantinos Drossos; Samuel Lipping; Tuomas Virtanen


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/7ae3c16e-5963-4b32-9ef7-83c0daf28229/clotho_audio_development.7z"
      }, 
      "checksum": "md5:e3ce88561b317cc3825e8c861cae1ec6", 
      "bucket": "7ae3c16e-5963-4b32-9ef7-83c0daf28229", 
      "key": "clotho_audio_development.7z", 
      "type": "7z", 
      "size": 3433217203
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7ae3c16e-5963-4b32-9ef7-83c0daf28229/clotho_audio_evaluation.7z"
      }, 
      "checksum": "md5:4569624ccadf96223f19cb59fe4f849f", 
      "bucket": "7ae3c16e-5963-4b32-9ef7-83c0daf28229", 
      "key": "clotho_audio_evaluation.7z", 
      "type": "7z", 
      "size": 1249726503
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7ae3c16e-5963-4b32-9ef7-83c0daf28229/clotho_captions_development.csv"
      }, 
      "checksum": "md5:dd568352389f413d832add5cf604529f", 
      "bucket": "7ae3c16e-5963-4b32-9ef7-83c0daf28229", 
      "key": "clotho_captions_development.csv", 
      "type": "csv", 
      "size": 1002092
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7ae3c16e-5963-4b32-9ef7-83c0daf28229/clotho_captions_evaluation.csv"
      }, 
      "checksum": "md5:1b16b9e57cf7bdb7f13a13802aeb57e2", 
      "bucket": "7ae3c16e-5963-4b32-9ef7-83c0daf28229", 
      "key": "clotho_captions_evaluation.csv", 
      "type": "csv", 
      "size": 361995
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7ae3c16e-5963-4b32-9ef7-83c0daf28229/clotho_metadata_development.csv"
      }, 
      "checksum": "md5:582c18ee47cebdbe33dce1feeab53a56", 
      "bucket": "7ae3c16e-5963-4b32-9ef7-83c0daf28229", 
      "key": "clotho_metadata_development.csv", 
      "type": "csv", 
      "size": 624848
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7ae3c16e-5963-4b32-9ef7-83c0daf28229/clotho_metadata_evaluation.csv"
      }, 
      "checksum": "md5:13946f054d4e1bf48079813aac61bf77", 
      "bucket": "7ae3c16e-5963-4b32-9ef7-83c0daf28229", 
      "key": "clotho_metadata_evaluation.csv", 
      "type": "csv", 
      "size": 225311
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/7ae3c16e-5963-4b32-9ef7-83c0daf28229/LICENSE"
      }, 
      "checksum": "md5:afceb2583a64aea27ed4f14099126af0", 
      "bucket": "7ae3c16e-5963-4b32-9ef7-83c0daf28229", 
      "key": "LICENSE", 
      "type": "", 
      "size": 1864
    }
  ], 
  "owners": [
    41005
  ], 
  "doi": "10.5281/zenodo.3490684", 
  "stats": {
    "version_unique_downloads": 7700.0, 
    "unique_views": 8040.0, 
    "views": 8960.0, 
    "version_views": 10248.0, 
    "unique_downloads": 6794.0, 
    "version_unique_views": 9041.0, 
    "volume": 8965640661580.0, 
    "version_downloads": 14596.0, 
    "downloads": 12010.0, 
    "version_volume": 11471834441607.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.3490684", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.3490683", 
    "bucket": "https://zenodo.org/api/files/7ae3c16e-5963-4b32-9ef7-83c0daf28229", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.3490683.svg", 
    "html": "https://zenodo.org/record/3490684", 
    "latest_html": "https://zenodo.org/record/4783391", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.3490684.svg", 
    "latest": "https://zenodo.org/api/records/4783391"
  }, 
  "conceptdoi": "10.5281/zenodo.3490683", 
  "created": "2019-10-17T11:03:19.467392+00:00", 
  "updated": "2021-05-26T15:42:13.540297+00:00", 
  "conceptrecid": "3490683", 
  "revision": 36, 
  "id": 3490684, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.3490684", 
    "version": "1.0", 
    "language": "eng", 
    "title": "Clotho dataset", 
    "license": {
      "id": "other-at"
    }, 
    "related_identifiers": [
      {
        "scheme": "url", 
        "identifier": "https://github.com/audio-captioning/clotho-dataset", 
        "relation": "isSupplementedBy", 
        "resource_type": "software"
      }, 
      {
        "scheme": "url", 
        "identifier": "https://arxiv.org/abs/1910.09387", 
        "relation": "isSupplementTo", 
        "resource_type": "publication-conferencepaper"
      }, 
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.3490683", 
        "relation": "isVersionOf"
      }
    ], 
    "relations": {
      "version": [
        {
          "count": 3, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "3490683"
          }, 
          "is_last": false, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "4783391"
          }
        }
      ]
    }, 
    "communities": [
      {
        "id": "audio-captioning"
      }, 
      {
        "id": "tut-arg"
      }
    ], 
    "grants": [
      {
        "code": "637422", 
        "links": {
          "self": "https://zenodo.org/api/grants/10.13039/501100000780::637422"
        }, 
        "title": "Computational Analysis of Everyday Soundscapes", 
        "acronym": "EVERYSOUND", 
        "program": "H2020", 
        "funder": {
          "doi": "10.13039/501100000780", 
          "acronyms": [], 
          "name": "European Commission", 
          "links": {
            "self": "https://zenodo.org/api/funders/10.13039/501100000780"
          }
        }
      }
    ], 
    "references": [
      "Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245"
    ], 
    "keywords": [
      "Clotho", 
      "Audio captioning", 
      "Dataset", 
      "Audio processing", 
      "Signal processing", 
      "Machine listening", 
      "Computational auditory scene analysis", 
      "Captioning", 
      "Deep learning"
    ], 
    "publication_date": "2019-10-15", 
    "creators": [
      {
        "orcid": "0000-0002-3605-7127", 
        "affiliation": "Audio Research Group, Faculty of Information Technology and Communication Sciences, Tampere University", 
        "name": "Konstantinos Drossos"
      }, 
      {
        "affiliation": "Audio Research Group, Faculty of Information Technology and Communication Sciences, Tampere University", 
        "name": "Samuel Lipping"
      }, 
      {
        "orcid": "0000-0002-4604-9729", 
        "affiliation": "Audio Research Group, Faculty of Information Technology and Communication Sciences, Tampere University", 
        "name": "Tuomas Virtanen"
      }
    ], 
    "access_right": "open", 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    }, 
    "description": "<p>Clotho is a novel audio captioning dataset, consisting of&nbsp;4981 audio samples, and each audio sample has five captions (a total of&nbsp;24 905 captions).&nbsp;Audio samples are of 15 to 30 s duration and captions are eight to 20 words long.&nbsp;</p>\n\n<p>Clotho is thoroughly described in our paper:</p>\n\n<p><em>K. Drossos, S. Lipping and T. Virtanen, &quot;Clotho: an Audio Captioning Dataset,&quot; IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736-740, doi: 10.1109/ICASSP40776.2020.9052990.</em></p>\n\n<p>available online at:&nbsp;<a href=\"https://arxiv.org/abs/1910.09387\">https://arxiv.org/abs/1910.09387</a>&nbsp;and at:&nbsp;<a href=\"https://ieeexplore.ieee.org/document/9052990\">https://ieeexplore.ieee.org/document/9052990</a>&nbsp;</p>\n\n<p><strong>If you use Clotho, please cite our paper.</strong></p>\n\n<p>&nbsp;</p>\n\n<p><strong>To use the dataset, you can use our code at:</strong>&nbsp;<a href=\"https://github.com/audio-captioning/clotho-dataset\">https://github.com/audio-captioning/clotho-dataset</a>&nbsp;</p>\n\n<p>&nbsp;</p>\n\n<p>These are the files for the development and evaluation splits of Clotho dataset.&nbsp;</p>\n\n<p>--------------------------------------------------------------------------------------------------------</p>\n\n<p><strong>== Usage ==</strong></p>\n\n<p>To use the dataset you have to:</p>\n\n<ol>\n\t<li>Download the audio files:&nbsp;clotho_audio_development.7z and&nbsp;clotho_audio_evalution.7z</li>\n\t<li>Download the files with the captions:&nbsp;clotho_captions_development.csv and&nbsp;clotho_captions_evaluation.csv</li>\n\t<li>Download the files with the associated metadata:&nbsp;clotho_metadata_development.csv and&nbsp;clotho_metadata_evaluation.csv</li>\n\t<li>Extract the audio files</li>\n\t<li>Then you can use each audio file with its corresponding captions</li>\n</ol>\n\n<p>--------------------------------------------------------------------------------------------------------</p>\n\n<p><strong>== License&nbsp;==</strong></p>\n\n<p>The audio files in the archives:</p>\n\n<ul>\n\t<li>clotho_audio_development.7z and</li>\n\t<li>clotho_audio_evalution.7z</li>\n</ul>\n\n<p>and the associated meta-data in the CSV files:</p>\n\n<ul>\n\t<li>clotho_metadata_development.csv</li>\n\t<li>clotho_metadata_evaluation.csv</li>\n</ul>\n\n<p>are under the corresponding licences (<strong><em>mostly CreativeCommons with attribution</em></strong>) of Freesound [1] platform, mentioned explicitly in the CSV files for each of the audio files. That is, each audio file in the 7z archives is listed in the CSV files with the meta-data. The meta-data for each file are:&nbsp;</p>\n\n<ul>\n\t<li>File name</li>\n\t<li>Keywords</li>\n\t<li>URL for the original audio file</li>\n\t<li>Start and ending samples for the excerpt that is used in the Clotho dataset</li>\n\t<li>Uploader/user in the Freesound platform&nbsp;(manufacturer)</li>\n\t<li>Link to the licence of the file</li>\n</ul>\n\n<p>The captions in the files:</p>\n\n<ul>\n\t<li>clotho_captions_development.csv</li>\n\t<li>clotho_captions_evaluation.csv</li>\n</ul>\n\n<p>are under the Tampere University licence, described in the LICENCE file (<strong><em>mainly a non-commercial with attribution licence</em></strong>).&nbsp;</p>\n\n<p>--------------------------------------------------------------------------------------------------------</p>\n\n<p><strong>== References ==</strong><br>\n[1]&nbsp;Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM &#39;13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245</p>"
  }
}
10,248
14,596
views
downloads
All versions This version
Views 10,2488,960
Downloads 14,59612,010
Data volume 11.5 TB9.0 TB
Unique views 9,0418,040
Unique downloads 7,7006,794

Share

Cite as