Music Streams Labelled with Listening Situation - [User/Track/Device/Situation] Dataset

Karim M. Ibrahim; Elena V. Epure; Geoffroy Peeters; Gaël Richard

doi:10.5281/zenodo.5552288

Published October 6, 2021 | Version v1

Dataset Open

Music Streams Labelled with Listening Situation - [User/Track/Device/Situation] Dataset

1. Telecom Paris
2. Deezer

This is a contextual music dataset labeled with the listening situation associated with each stream. Each stream is composed of the user, track, and device data labelled with a situation. The dataset is collected from Deezer for the period of August 2019 from France and Brazil. The dataset is composed of 3 subsets of situations corresponding to 4, 8, and 12 different situations. The situations are extracted based on keyword matching with the associated playlist title in the Deezer catalog. The full set of situational tags are: "work, gym, party, sleep | morning, run, night, dance | car, train, relax, club".

Each instance contains the track/user/deviice triplets, and a situational tag indicating that this user listens to the track in the associated situation wth the corresponding data recieved from the device. The device data contain: "linear-time, linear-day, circular-time X, circular-time Y,circular-day X, circular-day Y, device-type, network-type". The users are represented as embeddings based on their listening history computed through the matrix factorization of the user/track matrix. Additionally, the users are also represented with their demographic data of : "age, country, gender".

The creation of the dataset and our experimental results are described in the paper: Karim M. Ibrahim, Elena V. Epure, Geoffroy Peeters, and Gaël Richard. "Audio Autotagging as Proxy for Contextual MusicRecommendation" [Under Revision]. The source code of the paper is available here: https://github.com/KarimMibrahim/Situational_Session_Generator.git

The dataset is composed of the media_id which is the ID of the track in the Deezer catalog. The 30 seconds track previews used to train the model in the paper can be accessed through the Deezer API: https://developers.deezer.com/api. Each user is represented with an anonymized user_id which is associated with the user embedding available in the user_embeddings.npy file. Note: The index of the embeddings in the user_embeddings arrary corresponds to the user_id, i.e. user_id = 100 have its embeddings at user_embeddings[100].

Finally, the dataset also contains the splits used in our experiments. Our splits were conditioned by one of three conditions: ColdTrack (no overlap of tracks between the splits), ColdUser (no overlap of users between the splits), and WarmCase (overlaps allowed). Each condition is split into 4 subsets for cross-validation marked with a "fold" number in each condition.

Files

anon_user_idx_ordered.txt

Files (854.3 MB)

Name	Size	Download all
anon_user_idx_ordered.txt md5:e43f47bdeda78e191c8b009ed0f35ef2	724.4 kB	Preview Download
coldTrack_12Labels.csv md5:77abc0a6b027ece9e7093fde1ea74f7e	107.6 MB	Preview Download
coldTrack_4Labels.csv md5:3991bab0c89adc6be540fb265ceb4341	40.6 MB	Preview Download
coldTrack_8Labels.csv md5:9774fba853ab0c14bcedf771887660b2	77.4 MB	Preview Download
coldUser_12Labels.csv md5:84c0178b08493510f9d664b05b13a0f3	107.6 MB	Preview Download
coldUser_4Labels.csv md5:6a3ee22368ce37113d45205322b8ca8b	40.6 MB	Preview Download
coldUser_8Labels.csv md5:430aae3c8cb552de5c54c651d1a28116	77.4 MB	Preview Download
track_ids.txt md5:fdfc6ff2fb1d098a481a1873608b6c8b	1.6 MB	Preview Download
user_embeddings.npy md5:78161c2cdea7acd83323db9afef4c98f	175.2 MB	Download
warm_12Labels.csv md5:bca3f4241649b8f3cb71ec14bdb58c28	107.6 MB	Preview Download
warm_4Labels.csv md5:8f014c8cc0068de99028de8aeeda7cca	40.6 MB	Preview Download
warm_8Labels.csv md5:3dec7512da80198a7419addc4ef7b62e	77.4 MB	Preview Download

Additional details

European Commission
MIP-Frontiers – New Frontiers in Music Information Processing 765068

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	741	736
Downloads	418	415
Data volume	43.3 GB	42.8 GB

Music Streams Labelled with Listening Situation - [User/Track/Device/Situation] Dataset

Creators

Description

Files

anon_user_idx_ordered.txt

Files (854.3 MB)

Additional details

Funding