Published March 27, 2024 | Version v1
Dataset Open

DCASE 2024 Task 9: Language-Queried Audio Source Separation | Pre-trained Weights for the Baseline System

  • 1. ROR icon University of Surrey
  • 2. ByteDance

Description

== Descriptions ==

We trained the AudioSep [1] model using the development set (Clotho and augmented FSD50K datasets) for 200k steps with a batch size of 16 using one Nvidia A100 GPU (around 1 day). Model details can be found in the AudioSep paper.

Pre-trained weights for the baseline system:

  • audiosep_16k,baseline,step=200000.ckpt

Baseline codebase:

== Reference ==

[1] Liu X, Kong Q, Zhao Y, et al. Separate anything you describe. arXiv:2308.05037, 2023.

== Contact ==

Xubo Liu, xubo.liu@surrey.ac.uk

Files

Files (1.2 GB)

Name Size Download all
md5:20b366aea37204f7da292d21f1fce814
1.2 GB Download