Published April 1, 2024 | Version v1
Model Open

DCASE2024 Task6 Baseline - Automated Audio Captioning (ConvNeXt-Transformer)

  • 1. Institut de Recherche en Informatique de Toulouse

Description

DCASE2024 Task6 Baseline: ConvNeXt-Transformer model for Automated Audio Captioning.

  • This model is trained on the Clotho dataset
  • Extracts features using ConvNeXt
  • System reaches 29.6% SPIDEr-FL score on Clotho-eval (also named development-testing in DCASE)

This model requires representation extracted using a ConvNeXt pretrained for audio classification, available here under the filename convnext_tiny_465mAP_BL_AC_70kit.pth.

Files

tokenizer.json

Files (148.3 MB)

Name Size Download all
md5:9514a8e6fa547bd01fb1badde81c6d10
148.2 MB Download
md5:ee3fef19f7d0891d820d84035483a900
101.4 kB Preview Download

Additional details

Related works

Requires
Model: 10.5281/zenodo.8020843 (DOI)

Software

Repository URL
https://github.com/Labbeti/dcase2024-task6-baseline
Programming language
Python