Published September 27, 2018 | Version v1
Dataset Restricted

Spotify Million Playlist: Recsys Challenge 2018 Dataset

Authors/Creators

  • 1. AIcrowd

Description

Spotify Million Playlist Dataset Challenge

Summary

The Spotify Million Playlist Dataset Challenge consists of a dataset and evaluation to enable research in music recommendations. It is a continuation of the RecSys Challenge 2018, which ran from January to July 2018. The dataset contains 1,000,000 playlists, including playlist titles and track titles, created by users on the Spotify platform between January 2010 and October 2017. The evaluation task is automatic playlist continuation: given a seed playlist title and/or initial set of tracks in a playlist, to predict the subsequent tracks in that playlist. This is an open-ended challenge intended to encourage research in music recommendations, and no prizes will be awarded (other than bragging rights).

Background

Playlists like Today’s Top Hits and RapCaviar have millions of loyal followers, while Discover Weekly and Daily Mix are just a couple of our personalized playlists made especially to match your unique musical tastes.

Our users love playlists too. In fact, the Digital Music Alliance, in their 2018 Annual Music Report, state that 54% of consumers say that playlists are replacing albums in their listening habits.

But our users don’t love just listening to playlists, they also love creating them. To date, over 4 billion playlists have been created and shared by Spotify users. People create playlists for all sorts of reasons: some playlists group together music categorically (e.g., by genre, artist, year, or city), by mood, theme, or occasion (e.g., romantic, sad, holiday), or for a particular purpose (e.g., focus, workout). Some playlists are even made to land a dream job, or to send a message to someone special.

The other thing we love here at Spotify is playlist research. By learning from the playlists that people create, we can learn all sorts of things about the deep relationship between people and music. Why do certain songs go together? What is the difference between “Beach Vibes” and “Forest Vibes”? And what words do people use to describe which playlists?

By learning more about nature of playlists, we may also be able to suggest other tracks that a listener would enjoy in the context of a given playlist. This can make playlist creation easier, and ultimately help people find more of the music they love.

Dataset

To enable this type of research at scale, in 2018 we sponsored the RecSys Challenge 2018, which introduced the Million Playlist Dataset (MPD) to the research community. Sampled from the over 4 billion public playlists on Spotify, this dataset of 1 million playlists consist of over 2 million unique tracks by nearly 300,000 artists, and represents the largest public dataset of music playlists in the world. The dataset includes public playlists created by US Spotify users between January 2010 and November 2017. The challenge ran from January to July 2018, and received 1,467 submissions from 410 teams. A summary of the challenge and the top scoring submissions was published in the ACM Transactions on Intelligent Systems and Technology.

In September 2020, we re-released the dataset as an open-ended challenge on AIcrowd.com. The dataset can now be downloaded by registered participants from the Resources page.

Each playlist in the MPD contains a playlist title, the track list (including track IDs and metadata), and other metadata fields (last edit time, number of playlist edits, and more). All data is anonymized to protect user privacy. Playlists are sampled with some randomization, are manually filtered for playlist quality and to remove offensive content, and have some dithering and fictitious tracks added to them. As such, the dataset is not representative of the true distribution of playlists on the Spotify platform, and must not be interpreted as such in any research or analysis performed on the dataset.

Dataset Contains

1000 examples of each scenario:

Title only (no tracks) Title and first track Title and first 5 tracks First 5 tracks only Title and first 10 tracks First 10 tracks only Title and first 25 tracks Title and 25 random tracks Title and first 100 tracks Title and 100 random tracks

Download Link

Full Details: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge
Download Link: https://www.aicrowd.com/challenges/spotify-million-playlist-dataset-challenge/dataset_files

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/6425593">Log in</a> to check if you have access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

For purposes of this Challenge, Spotify will provide data as described on the Website (“Spotify Data”). Spotify hereby grants you a limited non-exclusive license to use the Spotify Data solely as required to prepare your Challenge Result (defined below). All other uses of Spotify Data are strictly prohibited. You hereby agree not to use, sell, rent, transfer, distribute, make available, or otherwise disclose Spotify Data, other than as required to prepare your Challenge Result. Further, you hereby agree not to attempt to: identify any individuals from the Spotify Data, use the Spotify Data or any data derived therefrom for any commercial purpose, or reverse engineer any aspect of Spotify’s technology or data. Any rights not explicitly granted herein are expressly reserved by Spotify.


The Spotify Data may include some publicly available data relating to Spotify users; however, the Spotify Data does not contain any data about Spotify users who were residents of the European Union as of December 1, 2017. You agree to notify Spotify immediately upon learning of any possible unauthorized transmission, access or use of the Spotify Data and agree to work with Spotify to rectify any such unauthorized transmission, access or use.

You are currently not logged in. Do you have an account? Log in here

Additional details

References

  • C.W. Chen, P. Lamere, M. Schedl, and H. Zamani. Recsys Challenge 2018: Automatic Music Playlist Continuation. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18), 2018.