Datasets to Evaluate Accuracy, Miscalibration and Popularity Lift in Recommendations

doi:10.5281/zenodo.7428435

Published December 12, 2022 | Version 1.0

Dataset Open

Datasets to Evaluate Accuracy, Miscalibration and Popularity Lift in Recommendations

Kowald, Dominik¹

1. Know-Center GmbH & TU Graz

This repository contains three datasets for evaluating accuracy, miscalibration and popularity lift in recommender systems. All datasets contain genre/category information in addition to different user group splits:

Last.fm (lfm.zip), based on the LFM-1b dataset of JKU Linz (http://www.cp.jku.at/datasets/LFM-1b/)
MovieLens (ml.zip), based on MovieLens-1M dataset (https://grouplens.org/datasets/movielens/1m/)
MyAnimeList (anime.zip), based on the MyAnimeList dataset of Kaggle (https://www.kaggle.com/CooperUnion/anime-recommendations-database)

'user_events_cats.txt' contains the users' rating/interaction data along with a list of genres/categories assigend to the rated items. The list of categories is given in 'categories.txt'. Additionally, assignments to three user groups that differ in their inclination to popular/mainstream items are provided: LowPop in 'low_main_users.txt', MedPop in 'med_main_users.txt', and HighPop in 'high_main_users.txt'.

The format of the three user files are "user,mainstreaminess"

The format of the user-events files are "user,item,preference,cats", where different categories are separated by '|'

The format of the categories files are "category-name,index", where index refers to the category-id in the user-events files

Example Python-code for analyzing the datasets as well as empirical results on calibration, popularity lift and accuracy can be found on GitHub: https://github.com/domkowald/FairRecSys

Files

anime.zip

Files (17.6 MB)

Name	Size	Download all
anime.zip md5:27217f1d39a71e698dbf4f234933c619	3.0 MB	Preview Download
lfm.zip md5:b72b9b603d89bcc1a94087b1ccb15834	12.2 MB	Preview Download
ml.zip md5:b0d4115336aadc71a2f589eb7d7b1d66	2.4 MB	Preview Download

	All versions	This version
Views	192	191
Downloads	29	29
Data volume	244.3 MB	244.3 MB

Datasets to Evaluate Accuracy, Miscalibration and Popularity Lift in Recommendations

Creators

Description

Files

anime.zip

Files (17.6 MB)