Published December 12, 2022 | Version 1.0
Dataset Open

Datasets to Evaluate Accuracy, Miscalibration and Popularity Lift in Recommendations

  • 1. Know-Center GmbH & TU Graz

Description

This repository contains three datasets for evaluating accuracy, miscalibration and popularity lift in recommender systems. All datasets contain genre/category information in addition to different user group splits:

  1. Last.fm (lfm.zip), based on the LFM-1b dataset of JKU Linz (http://www.cp.jku.at/datasets/LFM-1b/)
  2. MovieLens (ml.zip), based on MovieLens-1M dataset (https://grouplens.org/datasets/movielens/1m/)
  3. MyAnimeList (anime.zip), based on the MyAnimeList dataset of Kaggle (https://www.kaggle.com/CooperUnion/anime-recommendations-database)

'user_events_cats.txt' contains the users' rating/interaction data along with a list of genres/categories assigend to the rated items. The list of categories is given in 'categories.txt'. Additionally, assignments to three user groups that differ in their inclination to popular/mainstream items are provided: LowPop in 'low_main_users.txt', MedPop in 'med_main_users.txt', and HighPop in 'high_main_users.txt'.

The format of the three user files are "user,mainstreaminess"

The format of the user-events files are "user,item,preference,cats", where different categories are separated by '|'

The format of the categories files are "category-name,index", where index refers to the category-id in the user-events files

Example Python-code for analyzing the datasets as well as empirical results on calibration, popularity lift and accuracy can be found on GitHub: https://github.com/domkowald/FairRecSys

Files

anime.zip

Files (17.6 MB)

Name Size Download all
md5:27217f1d39a71e698dbf4f234933c619
3.0 MB Preview Download
md5:b72b9b603d89bcc1a94087b1ccb15834
12.2 MB Preview Download
md5:b0d4115336aadc71a2f589eb7d7b1d66
2.4 MB Preview Download