Published March 1, 2010 | Version 1.2
Dataset Open

lastfm Music Recommendation Dataset

Creators

  • 1. Music Technology Group

Description

This is a common Zenodo repository for both lastfm-360K and lastfm-1K datasets. See below the details of both datasets, including license, acknowledgements, contact, and instructions to cite.

 

LASTFM-360K (version 1.2, March 2010).

  • What is this? This dataset contains <user, artist, plays> tuples (for ~360,000 users) collected from Last.fm API, using the user.getTopArtists() method.
  • Files:
    • usersha1-artmbid-artname-plays.tsv (MD5: be672526eb7c69495c27ad27803148f1)
    • usersha1-profile.tsv (MD5: 51159d4edf6a92cb96f87768aa2be678)
    • mbox_sha1sum.py (MD5: feb3485eace85f3ba62e324839e6ab39)
  • Data Statistics:
    • File usersha1-artmbid-artname-plays.tsv:
      • Total Lines: 17,559,530
      • Unique Users: 359,347
      • Artists with MBID: 186,642
      • Artists without MBID: 107,373
  • Data Format: The data is formatted one entry per line as follows (tab separated "\t"):
    • File usersha1-artmbid-artname-plays.tsv:
      user-mboxsha1 \t musicbrainz-artist-id \t artist-name \t plays
    • File usersha1-profile.tsv:
      user-mboxsha1 \t gender (m|f|empty) \t age (int|empty) \t country (str|empty) \t signup (date|empty)
  • Example:
    • File usersha1-artmbid-artname-plays.tsv:
      000063d3fe1cf2ba248b9e3c3f0334845a27a6be \t a3cb23fc-acd3-4ce0-8f36-1e5aa6a18432 \t u2 \t 31 ...
    • File usersha1-profile.tsv:
      000063d3fe1cf2ba248b9e3c3f0334845a27a6be \t m \t 19 \t Mexico \t Apr 28, 2008 ...

 

LASTFM-1K (version 1.0, March 2010).

  • What is this? This dataset contains <user, timestamp, artist, song> tuples collected from Last.fm API, using the user.getRecentTracks() method. This dataset represents the whole listening habits (till May, 5th 2009) for nearly 1,000 users.
  • Files:
    • userid-timestamp-artid-artname-traid-traname.tsv (MD5: 64747b21563e3d2aa95751e0ddc46b68)
    • userid-profile.tsv (MD5: c53608b6b445db201098c1489ea497df)
  • Data Statistics:
    • File userid-timestamp-artid-artname-traid-traname.tsv:
      • Total Lines: 19,150,868
      • Unique Users: 992
      • Artists with MBID: 107,528
      • Artists without MBDID: 69,420
  • Data Format: The data is formatted one entry per line as follows (tab separated, "\t"):
    • File userid-timestamp-artid-artname-traid-traname.tsv:
      userid \t timestamp \t musicbrainz-artist-id \t artist-name \t musicbrainz-track-id \t track-name
    • File userid-profile.tsv:
      userid \t gender ('m'|'f'|empty) \t age (int|empty) \t country (str|empty) \t signup (date|empty)
  • Example:
    • File userid-timestamp-artid-artname-traid-traname.tsv:
      user_000639 \t 2009-04-08T01:57:47Z \t MBID \t The Dogs D'Amour \t MBID \t Fall in Love Again?
      user_000639 \t 2009-04-08T01:53:56Z \t MBID \t The Dogs D'Amour \t MBID \t Wait Until I'm Dead ...
    • File userid-profile.tsv:
      user_000639 \t m \t Mexico \t Apr 27, 2005 ...

 

LICENSE OF BOTH DATASETS. The data contained in both datasets is distributed with permission of Last.fm. The data is made available for non-commercial use. Those interested in using the data or web services in a commercial context should contact:

partners [at] last [dot] fm

For more information see Last.fm terms of service

 

ACKNOWLEDGEMENTS. Thanks to Last.fm for providing the access to this data via their web services. Special thanks to Norman Casagrande.

 

REFERENCES. When using this dataset you must reference the Last.fm webpage. Optionally (not mandatory at all!), you can cite Chapter 3 of this book:

@book{Celma:Springer2010,
    author = {Celma, O.},
    title = {{Music Recommendation and Discovery in the Long Tail}},
    publisher = {Springer},
    year = {2010}
}

 

CONTACT: This data was collected by Òscar Celma @ MTG/UPF

Files

Files (1.2 GB)

Name Size Download all
md5:a79a6808f54f73354789a9fb02cb1e41
672.7 MB Download
md5:635e6ed3fc873aa4ba33aba0ebce02b1
569.2 MB Download