Published March 27, 2024
| Version v1
Dataset
Open
DCASE 2024 Task 9: Language-Queried Audio Source Separation | Pre-trained Weights for the Baseline System
Description
== Descriptions ==
We trained the AudioSep [1] model using the development set (Clotho and augmented FSD50K datasets) for 200k steps with a batch size of 16 using one Nvidia A100 GPU (around 1 day). Model details can be found in the AudioSep paper.
Pre-trained weights for the baseline system:
- audiosep_16k,baseline,step=200000.ckpt
Baseline codebase:
== Reference ==
[1] Liu X, Kong Q, Zhao Y, et al. Separate anything you describe. arXiv:2308.05037, 2023.
== Contact ==
Xubo Liu, xubo.liu@surrey.ac.uk
Files
Files
(1.2 GB)
Name | Size | Download all |
---|---|---|
md5:20b366aea37204f7da292d21f1fce814
|
1.2 GB | Download |