Published March 5, 2020 | Version v1
Other Open

Pre-trained weights for the baseline DNN system of DCASE 2020 automated audio captioning task

  • 1. Audio Research Group, Faculty of Information Technology and Communication Sciences, Tampere University

Description

This is the repository of the pre-trained weights for the baseline deep neural network (DNN), used in the baseline system of automated audio captioning at the DCASE 2020 Challenge. 

The pre-trained weights can be used with the baseline DNN in order to reproduce the reported results on the evaluation split (development-testing set in DCASE terminology) of the Clotho dataset. 

You can find the description of the automated audio captioning task and the reported results on the webpage of the task: http://dcase.community/challenge2020/task-automatic-audio-captioning  

Clotho dataset can be found at: https://zenodo.org/record/3490684

GitHub repositories of audio captioning can be found at: https://github.com/audio-captioning

 

If you use the baseline system, please consider citing the paper of Clotho: 

K. Drossos, S. Lipping, and T. Virtanen, "Clotho: An Audio Captioning Dataset," to be presented in the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 4-8, 2020

available online at: https://arxiv.org/abs/1910.09387 

Files

dcase_2020_model_baseline_pre_trained.zip

Files (17.0 MB)

Name Size Download all
md5:a2192b65bb11c93454982ae32590c576
17.0 MB Preview Download
md5:f22899d2e1c74422f2fcd09dd4d3c3fe
1.8 kB Download

Additional details

Related works

Is supplement to
Software: https://github.com/audio-captioning/dcase-2020-baseline (URL)
Is supplemented by
Dataset: https://zenodo.org/record/3490684 (URL)

Funding

European Commission
EVERYSOUND - Computational Analysis of Everyday Soundscapes 637422