FairCAT-generated datasets for benchmarking fairness-aware GNNs
Authors/Creators
Description
These synthetic datasets are created with an aim to be used for benchmarking fairness and performance of Graph Neural Networks (GNNs). The datasets are created specifically for the following experiments: balancing, correlations strength, data scaling, and synthetic German Credit, and synthetic Pokec_n. The environment for each experiment varies in only one variable, according to the experiment, while others stay fixed.
Balancing experiment: the ratio of the sensitive groups is varied for each graph: Balanced (50/50 split), Mild Imbalance (70/30), and Strong Imbalance (90/10).
Correlation strength experiment: graphs with varying strength of correlation between sensitive attribute and a non-sensitive attribute: Low (0.05), Medium (0.50), and High (0.95).
Scaling experiment: graphs are increasing in size: Small (number of nodes=2^15), Medium (nodes=2^20), Large (nodes=2^23; called 2^25 due to an error).
German FairCAT: a synthetic dataset that tries to reproduce real-world German Credit dataset. It can be used to compare GNN predictions with learning on the original dataset.
Pokec_n FairCAT: a synthetic dataset that tries to reproduce real-world Pokec_n dataset.
Files
balancing_tests.zip
Files
(54.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:4b45e636a490391931a2a862af92d6a6
|
11.7 MB | Preview Download |
|
md5:af6cc98e3f61a8eeae50a91ce1b04c8e
|
11.8 MB | Preview Download |
|
md5:81b1865906c1648a047efb2cb749e475
|
30.4 MB | Preview Download |
|
md5:fa1886b6ff5db16f0514c43ef0f74f6b
|
229.9 kB | Preview Download |
|
md5:ef7028e4ed301f978593898a27d83c12
|
443.3 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/skas08/FairCAT