Dataset Open Access
Karim M. Ibrahim; Elena V. Epure; Geoffroy Peeters; Gaël Richard
This is a user-aware music dataset labeled with the contextual use of each track according to each user. The dataset is composed of 10 contextual tags extracted based on user's usage through created playlists in the Deezer catalog. The tags are: " car, gym, happy, night, relax, running, sad, summer, work, workout". For each track/user pair, a contextual tag is associated with it indicating that the user listens to the track in the associated context. Additionally, the users are represented as embeddings based on their listening history computed through the matrix factorization of the user/track matrix.
The creation of the dataset and the baseline of our auto-tagging model is described in the paper: Karim M. Ibrahim, Elena V. Epure, Geoffroy Peeters, and Gaël Richard. "Should we consider the users in contextual music auto-tagging models?" 21st International Society for Music Information Retrieval Conference (ISMIR). 2020. The source code of the paper is available here: https://github.com/KarimMibrahim/user-aware-music-autotagging
The dataset is composed of the SONG_ID which is the ID of the track in the Deezer catalog. Each track/user pair is labeled with each tag as either 1 (indicating a track's presence in the context) or 0 (indicating a track's absence). The 30 seconds track previews used to train the model in the paper can be accessed through the Deezer API: https://developers.deezer.com/api. Each user is represented with an anonymized USER_ID which is associated with the user embedding available in the user_embeddings.csv file.