Clotho dataset

Konstantinos Drossos; Samuel Lipping; Tuomas Virtanen

doi:10.5281/zenodo.3490684

Published October 15, 2019 | Version 1.0

Dataset Open

Clotho dataset

1. Audio Research Group, Faculty of Information Technology and Communication Sciences, Tampere University

Clotho is a novel audio captioning dataset, consisting of 4981 audio samples, and each audio sample has five captions (a total of 24 905 captions). Audio samples are of 15 to 30 s duration and captions are eight to 20 words long.

Clotho is thoroughly described in our paper:

K. Drossos, S. Lipping and T. Virtanen, "Clotho: an Audio Captioning Dataset," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 736-740, doi: 10.1109/ICASSP40776.2020.9052990.

available online at: https://arxiv.org/abs/1910.09387 and at: https://ieeexplore.ieee.org/document/9052990

If you use Clotho, please cite our paper.

To use the dataset, you can use our code at: https://github.com/audio-captioning/clotho-dataset

These are the files for the development and evaluation splits of Clotho dataset.

--------------------------------------------------------------------------------------------------------

== Usage ==

To use the dataset you have to:

Download the audio files: clotho_audio_development.7z and clotho_audio_evalution.7z
Download the files with the captions: clotho_captions_development.csv and clotho_captions_evaluation.csv
Download the files with the associated metadata: clotho_metadata_development.csv and clotho_metadata_evaluation.csv
Extract the audio files
Then you can use each audio file with its corresponding captions

--------------------------------------------------------------------------------------------------------

== License ==

The audio files in the archives:

clotho_audio_development.7z and
clotho_audio_evalution.7z

and the associated meta-data in the CSV files:

clotho_metadata_development.csv
clotho_metadata_evaluation.csv

are under the corresponding licences (mostly CreativeCommons with attribution) of Freesound [1] platform, mentioned explicitly in the CSV files for each of the audio files. That is, each audio file in the 7z archives is listed in the CSV files with the meta-data. The meta-data for each file are:

File name
Keywords
URL for the original audio file
Start and ending samples for the excerpt that is used in the Clotho dataset
Uploader/user in the Freesound platform (manufacturer)
Link to the licence of the file

The captions in the files:

clotho_captions_development.csv
clotho_captions_evaluation.csv

are under the Tampere University licence, described in the LICENCE file (mainly a non-commercial with attribution licence).

--------------------------------------------------------------------------------------------------------

== References ==
[1] Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245

Files

clotho_captions_development.csv

Files (4.7 GB)

Name	Size	Download all
clotho_audio_development.7z md5:e3ce88561b317cc3825e8c861cae1ec6	3.4 GB	Download
clotho_audio_evaluation.7z md5:4569624ccadf96223f19cb59fe4f849f	1.2 GB	Download
clotho_captions_development.csv md5:dd568352389f413d832add5cf604529f	1.0 MB	Preview Download
clotho_captions_evaluation.csv md5:1b16b9e57cf7bdb7f13a13802aeb57e2	362.0 kB	Preview Download
clotho_metadata_development.csv md5:582c18ee47cebdbe33dce1feeab53a56	624.8 kB	Preview Download
clotho_metadata_evaluation.csv md5:13946f054d4e1bf48079813aac61bf77	225.3 kB	Preview Download
LICENSE md5:afceb2583a64aea27ed4f14099126af0	1.9 kB	Download

Additional details

Is supplement to: Conference paper: https://arxiv.org/abs/1910.09387 (URL)
Is supplemented by: Software: https://github.com/audio-captioning/clotho-dataset (URL)

European Commission
EVERYSOUND - Computational Analysis of Everyday Soundscapes 637422

Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411-412. DOI: https://doi.org/10.1145/2502081.2502245

	All versions	This version
Views	41,215	25,581
Downloads	67,067	30,302
Data volume	97.5 TB	30.9 TB

Clotho dataset

Files

clotho_captions_development.csv

Files (4.7 GB)

Additional details

Related works

Funding

References

Clotho dataset

Creators

Description

Files

clotho_captions_development.csv

Files (4.7 GB)

Additional details

Related works

Funding

References