DCASE 2024 Task 9: Language-Queried Audio Source Separation | Pre-trained Weights for the Baseline System

Liu, Xubo; Zhao, Yan

doi:10.5281/zenodo.10887460

Published March 27, 2024 | Version v1

Dataset Open

DCASE 2024 Task 9: Language-Queried Audio Source Separation | Pre-trained Weights for the Baseline System

1. University of Surrey
2. ByteDance

== Descriptions ==

We trained the AudioSep [1] model using the development set (Clotho and augmented FSD50K datasets) for 200k steps with a batch size of 16 using one Nvidia A100 GPU (around 1 day). Model details can be found in the AudioSep paper.

Pre-trained weights for the baseline system:

audiosep_16k,baseline,step=200000.ckpt

Baseline codebase:

GitHub: https://github.com/Audio-AGI/dcase2024_task9_baseline

== Reference ==

[1] Liu X, Kong Q, Zhao Y, et al. Separate anything you describe. arXiv:2308.05037, 2023.

== Contact ==

Xubo Liu, xubo.liu@surrey.ac.uk

Files

Files (1.2 GB)

Name	Size	Download all
audiosep_16k,baseline,step=200000.ckpt md5:20b366aea37204f7da292d21f1fce814	1.2 GB	Download

432

Views

131

Downloads

Show more details

	All versions	This version
Views	432	432
Downloads	131	131
Data volume	238.0 GB	238.0 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: March 27, 2024
Modified: March 27, 2024

DCASE 2024 Task 9: Language-Queried Audio Source Separation | Pre-trained Weights for the Baseline System

Creators

Description

Files

Files (1.2 GB)