Other Open Access

Pre-trained weights for the baseline DNN system of DCASE 2020 automated audio captioning task

Konstantinos Drossos; Samuel Lipping; Tuomas Virtanen

This is the repository of the pre-trained weights for the baseline deep neural network (DNN), used in the baseline system of automated audio captioning at the DCASE 2020 Challenge. 

The pre-trained weights can be used with the baseline DNN in order to reproduce the reported results on the evaluation split (development-testing set in DCASE terminology) of the Clotho dataset. 

You can find the description of the automated audio captioning task and the reported results on the webpage of the task: http://dcase.community/challenge2020/task-automatic-audio-captioning  

Clotho dataset can be found at: https://zenodo.org/record/3490684

GitHub repositories of audio captioning can be found at: https://github.com/audio-captioning


If you use the baseline system, please consider citing the paper of Clotho: 

K. Drossos, S. Lipping, and T. Virtanen, "Clotho: An Audio Captioning Dataset," to be presented in the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 4-8, 2020

available online at: https://arxiv.org/abs/1910.09387 

Files (17.0 MB)
Name Size
17.0 MB Download
1.8 kB Download
All versions This version
Views 267267
Downloads 114114
Data volume 1.6 GB1.6 GB
Unique views 244244
Unique downloads 9090


Cite as