Audio Moment Retrieval from Long Audio DCASE 2026 Evaluation dataset

Published June 1, 2026 | Version v1

Dataset Open

What is this?

This repository contains data for DCASE Challenge 2026 Task 6.

Audio and text features using CLAP and a sliding window, following the same feature extraction protocol as the CASTELLA dataset.
submission template

clap

└──dcase2026_evaluation_audio_{vid}.npz

clap_text

└──qiddcase2026_evaluation_q{qid}.npz

If participants require the raw audio, please contact the organizers

Predict the relevant moments for each qid in the submission template.
- The audio and text features in this repository correspond to the qid and vid in the template.
Fill in the predicted moments in the pred_relevant_windows field.
- Multiple moments can be submitted, which will be used to compute the mAP.
- Moments should be listed in descending order of confidence, and only the first moment in the list will be used for the final ranking.

Name	Size	Download all
clap.tar.gz md5:8337ca832788bf0046ce00043e487809	61.6 MB	Download
clap_text.tar.gz md5:99ec441e7ab4912dc756a2eeb543d2f2	3.9 MB	Download
dcase2026_evaluation.jsonl md5:de3c76a8ced437b9ccd45c2e761cbab7	23.3 kB	Download
submission_template.jsonl md5:fe6e544283733786bd7fcfee0260648f	27.5 kB	Download