Pre-trained weights for the baseline DNN system of DCASE 2021 automated audio captioning task
- 1. Audio Research Group, Faculty of Information Technology and Communication Sciences, Tampere University
Description
This is the repository of the pre-trained weights for the baseline deep neural network (DNN), used in the baseline system of automated audio captioning at the DCASE 2021 Challenge.
The pre-trained weights can be used with the baseline DNN in order to reproduce the reported results on the evaluation split (development-testing set in DCASE terminology) of the Clotho dataset.
You can find the description of the automated audio captioning task and the reported results on the webpage of the task: http://dcase.community/challenge2021/task-automatic-audio-captioning
Clotho dataset can be found at: https://zenodo.org/record/3490684
GitHub repositories of audio captioning can be found at: https://github.com/audio-captioning
If you use the baseline system, please consider citing the paper of Clotho:
K. Drossos, S. Lipping, and T. Virtanen, "Clotho: An Audio Captioning Dataset," to be presented in the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 4-8, 2020
available online at: https://arxiv.org/abs/1910.09387
Files
dcase_model_baseline_pre_trained.zip
Files
(16.9 MB)
Name | Size | Download all |
---|---|---|
md5:50ae5adc6787ec459994622dd05768d2
|
16.9 MB | Preview Download |
md5:abd8c71c4889f1ad0e689fc40274bbe5
|
1.8 kB | Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/audio-captioning/dcase-2021-baseline (URL)
- Is supplemented by
- Dataset: https://zenodo.org/record/3490684 (URL)