Extended Metadata for MGPHot (audio links, embeddings and more!)
Authors/Creators
Description
MGPHot Embeddings and Index Reconstruction Scripts
This repository provides precomputed embeddings for the six models described in the original paper https://arxiv.org/pdf/2509.06936 , evaluated on the three benchmark datasets: MGPHot, MagnaTagATune, and MTG-Jamendo.
The embeddings are stored in:
embeddings_autotagging.zip
We also provide a mirror of the scripts to reconstruct the canonical indices with full metadata, mirror_reconstruct.zip.
All metadata is already available in the original GitHub repository:
https://github.com/MTG/MGPHot-audio
Purpose
Provide precomputed embeddings for the six models evaluated in the paper.
Allow researchers to rebuild the canonical indices locally with the full metadata.
Ensure reproducibility while respecting the original dataset licenses.
Note: The number of embedding files may vary across models. Some extractors were designed to process the entire dataset, while others only generate embeddings for tracks associated with at least one of the selected tags.
How to Reconstruct
Run the reconstruction script python reconstruct_index.py to rebuild the canonical indices, verify outputs using the provided .md5 checksums, and print a summary report.
License
-
Code is released under the MIT License.
-
Metadata mappings follow the original dataset’s license (CC BY-NC-SA 4.0).
-
Do not upload reconstructed indices or audio anywhere online.
Citation
If you use these embeddings or scripts in your research, please cite the original paper:
@misc{ramoneda2025benchmark,
title = {Benchmarking Music Autotagging with MGPHot Expert Annotations vs. Generic Tag Datasets},
author = {Pedro Ramoneda and Pablo Alonso-Jim{\'e}nez and Sergio Oramas and Xavier Serra and Dmitry Bogdanov},
year = {2025},
eprint = {2509.06936},
archivePrefix = {arXiv},
primaryClass = {cs.SD},
url = {https://arxiv.org/abs/2509.06936}
}
Note on he durability of the benchmark
We are open to share specific missing audio files with research institutions to ensure reproducibility
Files
embeddings_autotagging.zip
Files
(2.8 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:6d504d5883fa4500dcedf4ba9e29b63d
|
2.8 GB | Preview Download |
|
md5:e0cb5b47ec7c169f766f7bc4d5c41f03
|
2.1 MB | Preview Download |