DCASE 2024 Task 9: Language-Queried Audio Source Separation | Evaluation Set

Liu, Xubo; Wang, Wenwu; Plumbley, Mark D; Le Roux, Jonathan; Wichern, Gordon; Zhao, Yan; Liu, Yuzhuo; Chen, Hangting

doi:10.5281/zenodo.11425256

Published June 1, 2024 | Version v1

Dataset Open

DCASE 2024 Task 9: Language-Queried Audio Source Separation | Evaluation Set

This is the evaluation set for Task 9, Language-Queried Audio Source Separation (LASS), in DCASE 2024 Challenge.

This evaluation set is meant to be used for Task 9 at the scientific challenge DCASE 2024. This split is not meant to be used for training LASS methods. This split is meant to be used for evaluating LASS methods in the final testing & ranking stage. All audio clips are sourced from Freesound, uploaded between April and October 2023. Each audio file has been segmented into 10-second clips and converted to mono 16 kHz.

This evaluation set consists of evaluation set (synth) and an evaluation set (real).

== Evaluation set (synth) ==

This evaluation set is created using 1,000 audio clips. Each clip is annotated with three captions describing the content of the clip. We created 3,000 synthetic mixtures with signal-to-noise ratios (SNR) ranging from -15 to 15 dB. Each synthetic mixture includes one natural language query and its corresponding target source. We used annotated tag information to ensure that the two audio clips used in each mix do not share overlapping sound source classes. The original audio files used to create these mixtures are not released. The mixtures and language queries are available for evaluation.

The audio files in the archives:

lass_evaluation_synth.zip

and the associated metadata (including audio filename and text queries) in the CSV file:

lass_synthetic_evaluation.csv

== Evaluation set (real) ==

This evaluation set consists of 100 audio clips. Each audio clip contains at least two overlapping sound sources. For each audio clip, we manually annotated their component sources using text descriptions, so that each clip can be used as a 'mixture' from which to extract one or more of the component sources based on a text query. Each audio clip in evaluation (real) was labeled with two such text queries.

The audio files in the archives:

lass_evaluation_real.zip

and the associated metadata (including audio filename and text queries) in the CSV file:

lass_real_evaluation.csv

Files

lass_evaluation_real.zip

Files (883.3 MB)

Name	Size	Download all
lass_evaluation_real.zip md5:099c45b39d2f5cd0fc252dd300d93a52	55.0 MB	Preview Download
lass_evaluation_synth.zip md5:f2b065fe732d23734dfc9fc9ef8122b5	828.0 MB	Preview Download
lass_real_evaluation.csv md5:14554340f600442d49ebd991e938d69f	15.1 kB	Preview Download
lass_synthetic_evaluation.csv md5:a37b1d709ade6f8fd5aacd21a638f0e8	289.4 kB	Preview Download

	All versions	This version
Views	210	210
Downloads	245	245
Data volume	58.5 GB	58.5 GB

DCASE 2024 Task 9: Language-Queried Audio Source Separation | Evaluation Set

Creators

Description

Files

lass_evaluation_real.zip

Files (883.3 MB)