There is a newer version of the record available.

Published January 29, 2026 | Version v1
Dataset Open

FairCAT-generated datasets for benchmarking fairness-aware GNNs

Authors/Creators

Description

These synthetic datasets are created with an aim to be used for benchmarking fairness and performance of Graph Neural Networks (GNNs). The datasets are created specifically for the following experiments: balancing, correlations strength, data scaling, and synthetic German Credit, and synthetic Pokec_n. The environment for each experiment varies in only one variable, according to the experiment, while others stay fixed.

 

Balancing experiment: the ratio of the sensitive groups is varied for each graph: Balanced (50/50 split), Mild Imbalance (70/30), and Strong Imbalance (90/10).

Correlation strength experiment: graphs with varying strength of correlation between sensitive attribute and a non-sensitive attribute: Low (0.05), Medium (0.50), and High (0.95).

Scaling experiment: graphs are increasing in size: Small (number of nodes=2^15), Medium (nodes=2^20), Large (nodes=2^23; called 2^25 due to an error).

German FairCAT: a synthetic dataset that tries to reproduce real-world German Credit dataset. It can be used to compare GNN predictions with learning on the original dataset.

Pokec_n FairCAT: a synthetic dataset that tries to reproduce real-world Pokec_n dataset.

Files

balancing_tests.zip

Files (54.5 MB)

Name Size Download all
md5:4b45e636a490391931a2a862af92d6a6
11.7 MB Preview Download
md5:af6cc98e3f61a8eeae50a91ce1b04c8e
11.8 MB Preview Download
md5:81b1865906c1648a047efb2cb749e475
30.4 MB Preview Download
md5:fa1886b6ff5db16f0514c43ef0f74f6b
229.9 kB Preview Download
md5:ef7028e4ed301f978593898a27d83c12
443.3 kB Preview Download

Additional details

Software