HumMusQA: A Human-written Music Understanding QA Benchmark Dataset

Weck, Benno; Puentes, Pablo; Poltronieri, Andrea; Prabhu, Satyajeet; Bogdanov, Dmitry

doi:10.5281/zenodo.18462524

Published February 2, 2026 | Version v1

Dataset Open

HumMusQA: A Human-written Music Understanding QA Benchmark Dataset

1. Pompeu Fabra University
2. Universitat Autònoma de Barcelona

HumMusQA: A Human-Written Music Understanding QA Benchmark Dataset

HumMusQA is a benchmark dataset for evaluating music understanding in Large Audio-Language Models (LALMs).
It contains 320 human-written multiple-choice questions created and validated by musically trained experts to test perception and interpretation of musical content.

This dataset accompanies the paper:

Benno Weck, Pablo Puentes, Andrea Poltronieri, Satyajeet Prabhu, and Dmitry Bogdanov. 2026. HumMusQA: A Human-written Music Understanding QA Benchmark Dataset. In Proceedings of the 4th Workshop on NLP for Music and Audio (NLP4MusA 2026), pages 58–67, Rabat, Morocco. Association for Computational Linguistics.

Files

HumMusQA.csv
Main dataset containing all questions.

Columns:

Song link
start time
end time
Question
True answer
Distractor 1
Distractor 2
Distractor 3
Main Category
Secondary Categories
Difficulty

metadata.csv
Track metadata and licensing information.

Columns:

track_id
song_link
name
artist_name
album_name
license_ccurl

audio_excerpts.zip
Trimmed audio excerpts corresponding to each question.

audio_full.zip
Full audio tracks.

Licensing

Each track follows its respective Creative Commons license, specified in metadata.csv.
Users must comply with the license associated with each track.

Citation

If you use this dataset, please cite:

@inproceedings{weck-etal-2026-hummusqa,
    title = "{H}um{M}us{QA}: A Human-written Music Understanding {QA} Benchmark Dataset",
    author = "Weck, Benno  and
      Puentes, Pablo  and
      Poltronieri, Andrea  and
      Prabhu, Satyajeet  and
      Bogdanov, Dmitry",
    editor = "Epure, Elena V.  and
      Oramas, Sergio  and
      Doh, SeungHeon  and
      Ramoneda, Pedro  and
      Kruspe, Anna  and
      Sordo, Mohamed",
    booktitle = "Proceedings of the 4th Workshop on {NLP} for Music and Audio ({NLP}4{M}us{A} 2026)",
    month = mar,
    year = "2026",
    address = "Rabat, Morocco",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2026.nlp4musa-1.9/",
    doi = "10.18653/v1/2026.nlp4musa-1.9",
    pages = "58--67",
    ISBN = "979-8-89176-369-2",
    abstract = "The evaluation of music understanding in Large Audio-Language Models (LALMs) requires a rigorously defined benchmark that truly tests whether models can perceive and interpret music, a standard that current data methodologies frequently fail to meet.This paper introduces a meticulously structured approach to music evaluation, proposing a new dataset of 320 hand-written questions curated and validated by experts with musical training, arguing that such focused, manual curation is superior for probing complex audio comprehension.To demonstrate the use of the dataset, we benchmark six state-of-the-art LALMs and additionally test their robustness to uni-modal shortcuts."
}

Files

HumMusQA.csv

Files (1.1 GB)

Name	Size
audio_excerpts.zip md5:ddc558760480cf6048f300c1a91184f4	475.0 MB	Preview Download
audio_full.zip md5:f10219538ef414e26ff2cb32e5bb2494	617.9 MB	Preview Download
HumMusQA.csv md5:a5c00b4d5135d403a6c0e22c8be7808c	62.9 kB	Preview Download
metadata.csv md5:55b6d6ac5437b05a727301e5bed48d16	15.2 kB	Preview Download

Additional details

Is described by: Conference paper: 10.18653/v1/2026.nlp4musa-1.9 (DOI)

	All versions	This version
Views	129	129
Downloads	259	259
Data volume	49.2 GB	49.2 GB

HumMusQA: A Human-written Music Understanding QA Benchmark Dataset

Authors/Creators

Description

HumMusQA: A Human-Written Music Understanding QA Benchmark Dataset

Files

Licensing

Citation

Files

HumMusQA.csv

Files (1.1 GB)

Additional details

Related works