Dataset Open Access

Fair RecSys Datasets

Kowald Dominik

Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="" xmlns:oai_dc="" xmlns:xsi="" xsi:schemaLocation="">
  <dc:creator>Kowald Dominik</dc:creator>
  <dc:description>Four multimedia recommender systems datasets to study popularity bias and fairness: (, based on the LFM-1b dataset of JKU Linz (
	MovieLens (, based on MovieLens-1M dataset (
	BookCrossing (, based on the BookCrossing dataset of Uni Freiburg (
	MyAnimeList (, based on the MyAnimeList dataset of Kaggle (

Each dataset contains of user interactions (user_events.txt) and three user groups that differ in their inclination to popular/mainstream items: LowPop (low_main_users.txt), MedPop (med_main_users.txt), and HighPop (high_main_users.txt).

The format of the three user files are "user,mainstreaminess"

The format of the user-events files are "user,item,preference"

Example Python-code for analyzing the datasets as well as more information on the user groups can be found on Github ( and on Arxiv (


  <dc:subject>multimedia recommender systems</dc:subject>
  <dc:subject>popularity bias</dc:subject>
  <dc:title>Fair RecSys Datasets</dc:title>
All versions This version
Views 304304
Downloads 3838
Data volume 145.6 MB145.6 MB
Unique views 250250
Unique downloads 2121


Cite as