Published June 25, 2025 | Version v1
Dataset Open

Neural Music Fingerprinting Dataset

  • 1. ROR icon Pompeu Fabra University

Description

# Neural Music Fingerprinting Dataset

A realistic dataset for evaluating music fingerprints under various audio degradations.

This data was used for the experiments in our ISMIR2025 paper 'Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification'.

We share the following data in Zenodo:
* music
  * training audio chunks (10,000 x 30 sec)
  * test queries (10,000 x 30 sec) (clean-time_shifted-degraded)
    * The time boundary of the chunks inside the full tracks. You can use to get the aligned, clean versions from the full tracks in the database. We couldn’t share the steps in between (clean, clean-time_shifted, clean-degraded) due to zenodo’s 50GB cap.
  * Full tracks of the queries (10,000 full tracks)
* degradation audio

The entire test database files take 400+ GB space, which can not be shared with Zenodo in a single repository. Therefore, you should download the FMA dataset and process them by following the steps in `dataset_creation/README` to get the entire database. Be sure to use the 10,000 database tracks that we include with this dataset. We included the full tracks of the query chunks so that the clean versions are exactly the same (During mp3 to wav conversion and processing sox may apply dithering, which is a stochastic process. Not sure about the effect of this, but ideally, master tracks should be the same.)

All audio files have 8,000 Hz sampling rate and are in `.wav` format encoded with 16-bit LPCM.

To decompress the tar ball: `tar -xJf neural-music-fp-dataset.tar.xz`

Please cite the following publication when using the code, data, or the models:

> R. O. Araz, G. Cortès-Sebastià, E. Molina, J. Serrà, X. Serra, Y. Mitsufuji, and D. Bogdanov, “Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification,” in Proceedings of the 26th International Society for Music Information Retrieval Conference (ISMIR), 2025.

```bibtex
@inproceedings{araz_enhancing_2025,
  title     = {Enhancing Neural Audio Fingerprint Robustness to Audio Degradation for Music Identification},
  author    = {Araz, R. Oguz and Cortès-Sebastià, Guillem and Molina, Emilio and Serrà, Joan and Serra, Xavier and Mitsufuji, Yuki and Bogdanov, Dmitry},
  booktitle = {Proceedings of the 26th International Society for Music Information Retrieval Conference (ISMIR)},
  year      = {2025}
}
```

## Directory Structure:

```
.
└── README.md
└── degradation
│   └── bg_noise
│   │   └── test
│   │   └── train
│   └── microphone_ir
│   │   └── test
│   │   └── train
│   └── room_ir
│   │   └── test
│   │   └── train
└── music
│   └── test-database-fma-ids.txt
│   └── test-queries-fma-ids.txt
│   └── train-fma-ids.txt
│   └── test
│   │   └── queries
│   │   │   └── clean-time_shifted-degraded
│   │   └── database
│   └── train
```

## Data Sources

- **Music**: https://github.com/mdeff/fma
- **Background Noise Degradation**:
  - https://dcase-repo.github.io/dcase_datalist/datasets/scenes/tut_asc_2016_eval.html
- **Room Impulse Response Degradation**:
  - https://www.openair.hosted.york.ac.uk/
  - https://www.iks.rwth-aachen.de/en/research/tools-downloads/databases/
aachen-impulse-response-database/ 
  - https://mcdermottlab.mit.edu/Reverb/IR_Survey.html
- **Microphone Impulse Response Degradation**:
  - https://zenodo.org/records/4633508

## Recreation

Please check the GitHub repository for a detailed README file into how the data splits were made, audio was processed, query audio was degraded.

https://github.com/raraz15/neural-music-fp/blob/main/dataset_creation/README.md

## Queries

Each .wav file is accompanied by a .npy with the same file stem. The numpy array contains the indices of the audio chunk's boundary inside the full track. These indices are usefull for:
* segment-level evaluation
* getting the clean version of the degraded audio from the full track
* for reproducibility, you can use our clean query chunks but apply different degradations, this way musical diversity is fixed.

## License

Each source indicated in the 'Data Sources' section assumes the license.
* FMA: Each track is distributed under the license chosen by the artist.
* DCASE TUT 2016: 'Free, free for academic usage (non-commercial), usually released under university specific EULA'
* OpenAIR: All files have CC BY 4.0 license.
* AIR: MIT license
* MIT IR: License not specified.

Files

Files (48.7 GB)

Name Size Download all
md5:f63bce6ee561e2e6f6f808cc021452aa
48.7 GB Download

Additional details