Dataset Open Access

Fair RecSys Datasets

Kowald Dominik

Four multimedia recommender systems datasets to study popularity bias and fairness:

  1. Last.fm (lfm.zip), based on the LFM-1b dataset of JKU Linz (http://www.cp.jku.at/datasets/LFM-1b/)
  2. MovieLens (ml.zip), based on MovieLens-1M dataset (https://grouplens.org/datasets/movielens/1m/)
  3. BookCrossing (book.zip), based on the BookCrossing dataset of Uni Freiburg (http://www2.informatik.uni-freiburg.de/~cziegler/BX/)
  4. MyAnimeList (anime.zip), based on the MyAnimeList dataset of Kaggle (https://www.kaggle.com/CooperUnion/anime-recommendations-database)

Each dataset contains of user interactions (user_events.txt) and three user groups that differ in their inclination to popular/mainstream items: LowPop (low_main_users.txt), MedPop (med_main_users.txt), and HighPop (high_main_users.txt).

The format of the three user files are "user,mainstreaminess"

The format of the user-events files are "user,item,preference"

Example Python-code for analyzing the datasets as well as more information on the user groups can be found on Github (https://github.com/domkowald/FairRecSys) and on Arxiv (https://arxiv.org/abs/2203.00376)

 

 

Files (16.3 MB)
Name Size
anime.zip
md5:537b5cdaf8c02e34a2552cd47eb58a82
2.1 MB Download
book.zip
md5:96cddcdc4dbb8b62ea1e7b96933415e7
3.0 MB Download
lfm.zip
md5:57a773a0c30c097dfc987a3fdb0b322e
9.2 MB Download
ml.zip
md5:6a879d1fc781e0b37c42bbbdc5f27deb
2.0 MB Download
251
29
views
downloads
All versions This version
Views 251251
Downloads 2929
Data volume 103.8 MB103.8 MB
Unique views 199199
Unique downloads 1515

Share

Cite as