Published April 1, 2024
| Version v1
Model
Open
DCASE2024 Task6 Baseline - Automated Audio Captioning (ConvNeXt-Transformer)
Description
DCASE2024 Task6 Baseline: ConvNeXt-Transformer model for Automated Audio Captioning.
- This model is trained on the Clotho dataset
- Extracts features using ConvNeXt
- System reaches 29.6% SPIDEr-FL score on Clotho-eval (also named development-testing in DCASE)
This model requires representation extracted using a ConvNeXt pretrained for audio classification, available here under the filename convnext_tiny_465mAP_BL_AC_70kit.pth.
Files
tokenizer.json
Files
(148.3 MB)
Name | Size | Download all |
---|---|---|
md5:9514a8e6fa547bd01fb1badde81c6d10
|
148.2 MB | Download |
md5:ee3fef19f7d0891d820d84035483a900
|
101.4 kB | Preview Download |
Additional details
Related works
- Requires
- Model: 10.5281/zenodo.8020843 (DOI)
Software
- Repository URL
- https://github.com/Labbeti/dcase2024-task6-baseline
- Programming language
- Python