Published October 6, 2021 | Version v1
Dataset Open

Music Streams Labelled with Listening Situation - [User/Track/Device/Situation] Dataset

  • 1. Telecom Paris
  • 2. Deezer

Description

This is a contextual music dataset labeled with the listening situation associated with each stream.  Each stream is composed of the user, track, and device data labelled with a situation. The dataset is collected from Deezer for the period of August 2019 from France and Brazil. The dataset is composed of 3 subsets of situations corresponding to 4, 8, and 12 different situations.  The situations are extracted based on keyword matching with the associated playlist title in the Deezer catalog. The full set of situational tags are: "work, gym, party, sleep | morning, run, night, dance | car, train, relax, club".

Each instance contains the track/user/deviice triplets, and a situational tag indicating that this user listens to the track in the associated situation wth the corresponding data recieved from the device. The device data contain: "linear-time, linear-day, circular-time X, circular-time Y,circular-day X, circular-day Y, device-type, network-type". The users are represented as embeddings based on their listening history computed through the matrix factorization of the user/track matrix. Additionally, the users are also represented with their demographic data of : "age, country, gender".

The creation of the dataset and our experimental results are described in the paper: Karim M. Ibrahim, Elena V. Epure, Geoffroy Peeters, and Gaël Richard. "Audio Autotagging as Proxy for Contextual MusicRecommendation" [Under Revision]. The source code of the paper is available here: https://github.com/KarimMibrahim/Situational_Session_Generator.git

The dataset is composed of the media_id which is the ID of the track in the Deezer catalog. The 30 seconds track previews used to train the model in the paper can be accessed through the Deezer API: https://developers.deezer.com/api. Each user is represented with an anonymized user_id which is associated with the user embedding available in the user_embeddings.npy file. Note: The index of the embeddings in the user_embeddings arrary corresponds to the user_id, i.e. user_id = 100 have its embeddings at  user_embeddings[100]. 

Finally, the dataset also contains the splits used in our experiments. Our splits were conditioned by one of three conditions: ColdTrack (no overlap of tracks between the splits), ColdUser (no overlap of users between the splits), and WarmCase (overlaps allowed). Each condition is split into 4 subsets for cross-validation marked with a "fold" number in each condition. 

Files

anon_user_idx_ordered.txt

Files (854.3 MB)

Name Size Download all
md5:e43f47bdeda78e191c8b009ed0f35ef2
724.4 kB Preview Download
md5:77abc0a6b027ece9e7093fde1ea74f7e
107.6 MB Preview Download
md5:3991bab0c89adc6be540fb265ceb4341
40.6 MB Preview Download
md5:9774fba853ab0c14bcedf771887660b2
77.4 MB Preview Download
md5:84c0178b08493510f9d664b05b13a0f3
107.6 MB Preview Download
md5:6a3ee22368ce37113d45205322b8ca8b
40.6 MB Preview Download
md5:430aae3c8cb552de5c54c651d1a28116
77.4 MB Preview Download
md5:fdfc6ff2fb1d098a481a1873608b6c8b
1.6 MB Preview Download
md5:78161c2cdea7acd83323db9afef4c98f
175.2 MB Download
md5:bca3f4241649b8f3cb71ec14bdb58c28
107.6 MB Preview Download
md5:8f014c8cc0068de99028de8aeeda7cca
40.6 MB Preview Download
md5:3dec7512da80198a7419addc4ef7b62e
77.4 MB Preview Download

Additional details

Funding

European Commission
MIP-Frontiers – New Frontiers in Music Information Processing 765068