DCASE 2024 Task 9: Language-Queried Audio Source Separation | Evaluation Set
Description
This is the evaluation set for Task 9, Language-Queried Audio Source Separation (LASS), in DCASE 2024 Challenge.
This evaluation set is meant to be used for Task 9 at the scientific challenge DCASE 2024. This split is not meant to be used for training LASS methods. This split is meant to be used for evaluating LASS methods in the final testing & ranking stage. All audio clips are sourced from Freesound, uploaded between April and October 2023. Each audio file has been segmented into 10-second clips and converted to mono 16 kHz.
This evaluation set consists of evaluation set (synth) and an evaluation set (real).
== Evaluation set (synth) ==
This evaluation set is created using 1,000 audio clips. Each clip is annotated with three captions describing the content of the clip. We created 3,000 synthetic mixtures with signal-to-noise ratios (SNR) ranging from -15 to 15 dB. Each synthetic mixture includes one natural language query and its corresponding target source. We used annotated tag information to ensure that the two audio clips used in each mix do not share overlapping sound source classes. The original audio files used to create these mixtures are not released. The mixtures and language queries are available for evaluation.
The audio files in the archives:
- lass_evaluation_synth.zip
and the associated metadata (including audio filename and text queries) in the CSV file:
- lass_synthetic_evaluation.csv
== Evaluation set (real) ==
This evaluation set consists of 100 audio clips. Each audio clip contains at least two overlapping sound sources. For each audio clip, we manually annotated their component sources using text descriptions, so that each clip can be used as a 'mixture' from which to extract one or more of the component sources based on a text query. Each audio clip in evaluation (real) was labeled with two such text queries.
The audio files in the archives:
- lass_evaluation_real.zip
and the associated metadata (including audio filename and text queries) in the CSV file:
- lass_real_evaluation.csv