Datasets to Evaluate Accuracy, Miscalibration and Popularity Lift in Recommendations
Description
This repository contains three datasets for evaluating accuracy, miscalibration and popularity lift in recommender systems. All datasets contain genre/category information in addition to different user group splits:
- Last.fm (lfm.zip), based on the LFM-1b dataset of JKU Linz (http://www.cp.jku.at/datasets/LFM-1b/)
- MovieLens (ml.zip), based on MovieLens-1M dataset (https://grouplens.org/datasets/movielens/1m/)
- MyAnimeList (anime.zip), based on the MyAnimeList dataset of Kaggle (https://www.kaggle.com/CooperUnion/anime-recommendations-database)
'user_events_cats.txt' contains the users' rating/interaction data along with a list of genres/categories assigend to the rated items. The list of categories is given in 'categories.txt'. Additionally, assignments to three user groups that differ in their inclination to popular/mainstream items are provided: LowPop in 'low_main_users.txt', MedPop in 'med_main_users.txt', and HighPop in 'high_main_users.txt'.
The format of the three user files are "user,mainstreaminess"
The format of the user-events files are "user,item,preference,cats", where different categories are separated by '|'
The format of the categories files are "category-name,index", where index refers to the category-id in the user-events files
Example Python-code for analyzing the datasets as well as empirical results on calibration, popularity lift and accuracy can be found on GitHub: https://github.com/domkowald/FairRecSys