Dim-Sim Dataset 
===================

The **dim-sim** dataset is a collection of user-annotated music similarity triplet ratings used to evaluate music similarity search and related algorithms. Our similarity ratings are linked to the [Million Song Dataset (MSD)](http://millionsongdataset.com) and were collected for the following paper:

> !["Dim-sim"](dim-sim.png "Dim-sim" )
> [Disentangled Multidimensional Metric Learning for Music Similarity](https://ieeexplore.ieee.org/document/9053442/)<br>
> Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, and Juhan Nam.<br>
> Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2020.
> 
> ```
>@inproceedings{Lee2019MusicSimilarity,
>  title={Disentangled Multidimensional Metric Learning For Music Similarity},
>  author={Lee, Jongpil and Bryan, Nicholas J. and Salamon, Justin and Jin, Zeyu, and Nam, Juhan},
>  booktitle={Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
>  year={2020},
>  organization={IEEE}
>}
>```


## About 

To collect our data, we randomly sampled 4,000 3-second triplets (i.e., anchor, song 1, song 2) from the [MSD](http://millionsongdataset.com) and asked people to annotate which track sounded more similar to the anchor (i.e., song 1 or song 2). Each triplet was annotated by 5-12 people, resulting in 39,440 raw human annotations. We then calculated the annotator agreement per triplet, defined as the ratio between the majority vote and total number of annotations, and filtered out triplets where the agreement was below 0.9, to create 879 high-agreement cleaned, human-annotated triplets. We have released both the raw and clean versions of the dataset in multiple formats discussed below.

## Formats

We have released both [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) and [JSON](https://www.json.org) versions of the data for both the raw (`raw-dim-sim`) and clean (`clean-dim-sim`) annotations as described above. For a given triplet rating, the following data is provided:

```
triplet_id	
anchor_id	
anchor_start_seconds	
anchor_start_samples	
song1_id	
song1_start_seconds	
song1_start_samples	
song2_id	
song2_start_seconds	
song2_start_samples	
sampling_rate	
clip_lengths_seconds	
clip_lengths_samples	
song1_vote	
song2_vote
```
For the raw versions, `song1_vote` and `song2_vote` correspond to the total number of users that voted for the each song respectively. For the clean versions, the values of `song1_vote` and `song2_vote` are set to 0 or 1. All clips used were exactly 3 seconds long. The `triplet_id`, `song1_id`, and `song2_id` denote the corresponding MSD track ID.


## License 

The **dim-sim** dataset is licensed under  <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/80x15.png" /></a><br /> [Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)](https://creativecommons.org/licenses/by-nc/4.0/).

## Please Acknowledge Dim-Sim in Academic Research
When **dim-sim** is used for academic research, we would highly appreciate it if scientific publications of works partly based on the **dim-sim** dataset cite the following publication:

> [Disentangled Multidimensional Metric Learning for Music Similarity](https://ieeexplore.ieee.org/document/9053442/)<br>
> Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, and Juhan Nam.<br>
> Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2020.
> 
> ```
>@inproceedings{Lee2019MusicSimilarity,
>  title={Disentangled Multidimensional Metric Learning For Music Similarity},
>  author={Lee, Jongpil and Bryan, Nicholas J. and Salamon, Justin and Jin, Zeyu, and Nam, Juhan},
>  booktitle={Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
>  year={2020},
>  organization={IEEE}
>}
>```
