Music Streams Labelled with Listening Situation - [User/Track/Device/Situation] Dataset
- 1. Telecom Paris
- 2. Deezer
Description
This is a contextual music dataset labeled with the listening situation associated with each stream. Each stream is composed of the user, track, and device data labelled with a situation. The dataset is collected from Deezer for the period of August 2019 from France and Brazil. The dataset is composed of 3 subsets of situations corresponding to 4, 8, and 12 different situations. The situations are extracted based on keyword matching with the associated playlist title in the Deezer catalog. The full set of situational tags are: "work, gym, party, sleep | morning, run, night, dance | car, train, relax, club".
Each instance contains the track/user/deviice triplets, and a situational tag indicating that this user listens to the track in the associated situation wth the corresponding data recieved from the device. The device data contain: "linear-time, linear-day, circular-time X, circular-time Y,circular-day X, circular-day Y, device-type, network-type". The users are represented as embeddings based on their listening history computed through the matrix factorization of the user/track matrix. Additionally, the users are also represented with their demographic data of : "age, country, gender".
The creation of the dataset and our experimental results are described in the paper: Karim M. Ibrahim, Elena V. Epure, Geoffroy Peeters, and Gaël Richard. "Audio Autotagging as Proxy for Contextual MusicRecommendation" [Under Revision]. The source code of the paper is available here: https://github.com/KarimMibrahim/Situational_Session_Generator.git
The dataset is composed of the media_id which is the ID of the track in the Deezer catalog. The 30 seconds track previews used to train the model in the paper can be accessed through the Deezer API: https://developers.deezer.com/api. Each user is represented with an anonymized user_id which is associated with the user embedding available in the user_embeddings.npy file. Note: The index of the embeddings in the user_embeddings arrary corresponds to the user_id, i.e. user_id = 100 have its embeddings at user_embeddings[100].
Finally, the dataset also contains the splits used in our experiments. Our splits were conditioned by one of three conditions: ColdTrack (no overlap of tracks between the splits), ColdUser (no overlap of users between the splits), and WarmCase (overlaps allowed). Each condition is split into 4 subsets for cross-validation marked with a "fold" number in each condition.
Files
anon_user_idx_ordered.txt
Files
(854.3 MB)
Name | Size | Download all |
---|---|---|
md5:e43f47bdeda78e191c8b009ed0f35ef2
|
724.4 kB | Preview Download |
md5:77abc0a6b027ece9e7093fde1ea74f7e
|
107.6 MB | Preview Download |
md5:3991bab0c89adc6be540fb265ceb4341
|
40.6 MB | Preview Download |
md5:9774fba853ab0c14bcedf771887660b2
|
77.4 MB | Preview Download |
md5:84c0178b08493510f9d664b05b13a0f3
|
107.6 MB | Preview Download |
md5:6a3ee22368ce37113d45205322b8ca8b
|
40.6 MB | Preview Download |
md5:430aae3c8cb552de5c54c651d1a28116
|
77.4 MB | Preview Download |
md5:fdfc6ff2fb1d098a481a1873608b6c8b
|
1.6 MB | Preview Download |
md5:78161c2cdea7acd83323db9afef4c98f
|
175.2 MB | Download |
md5:bca3f4241649b8f3cb71ec14bdb58c28
|
107.6 MB | Preview Download |
md5:8f014c8cc0068de99028de8aeeda7cca
|
40.6 MB | Preview Download |
md5:3dec7512da80198a7419addc4ef7b62e
|
77.4 MB | Preview Download |