Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published March 27, 2020 | Version v1
Dataset Open

Playlist2vec: Spotify Million Playlist Dataset

  • 1. Arizona State University

Description

This dataset was created using Spotify developer API. It consists of user-created as well as Spotify-curated playlists.
The dataset consists of 1 million playlists, 3 million unique tracks, 3 million unique albums, and 1.3 million artists.
The data is stored in a SQL database, with the primary entities being songs, albums, artists, and playlists.
Each of the aforementioned entities are represented by unique IDs (Spotify URI).
Data is stored into following tables:

  •     album
  •     artist
  •     track
  •     playlist
  •     track_artist1
  •     track_playlist1

album

| id | name | uri |

id: Album ID as provided by Spotify
name: Album Name as provided by Spotify
uri: Album URI as provided by Spotify


artist

| id | name | uri |

id: Artist ID as provided by Spotify
name: Artist Name as provided by Spotify
uri: Artist URI as provided by Spotify


track

| id | name | duration | popularity | explicit | preview_url | uri | album_id |

id: Track ID as provided by Spotify
name: Track Name as provided by Spotify
duration: Track Duration (in milliseconds) as provided by Spotify
popularity: Track Popularity as provided by Spotify
explicit: Whether the track has explicit lyrics or not. (true or false)
preview_url: A link to a 30 second preview (MP3 format) of the track. Can be null
uri: Track Uri as provided by Spotify
album_id: Album Id to which the track belongs


playlist

| id | name | followers | uri | total_tracks |

id: Playlist ID as provided by Spotify
name: Playlist Name as provided by Spotify
followers: Playlist Followers as provided by Spotify
uri: Playlist Uri as provided by Spotify
total_tracks: Total number of tracks in the playlist.

 

track_artist1

| track_id | artist_id |

Track-Artist association  table

 

track_playlist1

| track_id | playlist_id |

Track-Playlist association table

 

 - - - - - SETUP - - - - - 


The data is in the form of a SQL dump. The download size is about 10 GB, and the database populated from it comes out to about 35GB.

spotifydbdumpschemashare.sql contains the schema for the database (for reference):
spotifydbdumpshare.sql is the actual data dump.


Setup steps:
1. Create database <dbname>
2. mysql -u <username> -p <dbname> < spotifydbdumpshare.sql

 

- - - - - PAPER - - - - -


The description of this dataset can be found in the following paper:

Papreja P., Venkateswara H., Panchanathan S. (2020) Representation, Exploration and Recommendation of Playlists. In: Cellier P., Driessens K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol 1168. Springer, Cham

Files

Files (10.7 GB)

Name Size Download all
md5:015c03a86fd2d2c92426db68e83a1862
5.0 kB Download
md5:3549b42e207a76ba5c20e650f1cd044e
10.7 GB Download

Additional details

Related works

Is documented by
Conference paper: 10.1007/978-3-030-43887-6_50 (DOI)

References

  • Papreja, Piyush, Hemanth Venkateswara, and Sethuraman Panchanathan. "Representation, exploration and recommendation of playlists." Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Cham, 2019.