Playlist2vec: Spotify Million Playlist Dataset
Description
This dataset was created using Spotify developer API. It consists of user-created as well as Spotify-curated playlists.
The dataset consists of 1 million playlists, 3 million unique tracks, 3 million unique albums, and 1.3 million artists.
The data is stored in a SQL database, with the primary entities being songs, albums, artists, and playlists.
Each of the aforementioned entities are represented by unique IDs (Spotify URI).
Data is stored into following tables:
- album
- artist
- track
- playlist
- track_artist1
- track_playlist1
album
| id | name | uri |
id: Album ID as provided by Spotify
name: Album Name as provided by Spotify
uri: Album URI as provided by Spotify
artist
| id | name | uri |
id: Artist ID as provided by Spotify
name: Artist Name as provided by Spotify
uri: Artist URI as provided by Spotify
track
| id | name | duration | popularity | explicit | preview_url | uri | album_id |
id: Track ID as provided by Spotify
name: Track Name as provided by Spotify
duration: Track Duration (in milliseconds) as provided by Spotify
popularity: Track Popularity as provided by Spotify
explicit: Whether the track has explicit lyrics or not. (true or false)
preview_url: A link to a 30 second preview (MP3 format) of the track. Can be null
uri: Track Uri as provided by Spotify
album_id: Album Id to which the track belongs
playlist
| id | name | followers | uri | total_tracks |
id: Playlist ID as provided by Spotify
name: Playlist Name as provided by Spotify
followers: Playlist Followers as provided by Spotify
uri: Playlist Uri as provided by Spotify
total_tracks: Total number of tracks in the playlist.
track_artist1
| track_id | artist_id |
Track-Artist association table
track_playlist1
| track_id | playlist_id |
Track-Playlist association table
- - - - - SETUP - - - - -
The data is in the form of a SQL dump. The download size is about 10 GB, and the database populated from it comes out to about 35GB.
spotifydbdumpschemashare.sql contains the schema for the database (for reference):
spotifydbdumpshare.sql is the actual data dump.
Setup steps:
1. Create database <dbname>
2. mysql -u <username> -p <dbname> < spotifydbdumpshare.sql
- - - - - PAPER - - - - -
The description of this dataset can be found in the following paper:
Papreja P., Venkateswara H., Panchanathan S. (2020) Representation, Exploration and Recommendation of Playlists. In: Cellier P., Driessens K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol 1168. Springer, Cham
Files
Files
(10.7 GB)
Name | Size | Download all |
---|---|---|
md5:015c03a86fd2d2c92426db68e83a1862
|
5.0 kB | Download |
md5:3549b42e207a76ba5c20e650f1cd044e
|
10.7 GB | Download |
Additional details
Related works
- Is documented by
- Conference paper: 10.1007/978-3-030-43887-6_50 (DOI)
References
- Papreja, Piyush, Hemanth Venkateswara, and Sethuraman Panchanathan. "Representation, exploration and recommendation of playlists." Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, Cham, 2019.