PG-SB: A Benchmark for Schema Discovery in Property Graphs
Authors/Creators
Contributors
Description
PG-SB unifies ten datasets (five real, five synthetic) covering the domains of social networks, neuroscience connectomes, biomedicine, finance/leaks, communications, stream analytics, and internet measurements.
For each dataset, we provide the ground-truth schema and enumerate the corresponding type patterns (node/edge patterns) observed in the data, capturing the structural variability of label and property co-occurrence.
The benchmark includes a noise injection framework that (i) randomly removes 0-40 % of node/edge properties and (ii) varies label availability across three settings: 100% (all labels retained), 50% (half retained), and 0% (no labels), summing up to 150 test cases.
(All dataset resources are inside the datasets.zip)
Files
dataset_info.pdf
Files
(11.8 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:578151033be0d7662b0fdee17940797e
|
15.1 kB | Download |
|
md5:833f68cab27269d7d37e033dc0b21710
|
97 Bytes | Preview Download |
|
md5:742698a2a8306f14fa4dced4d6d7f332
|
385 Bytes | Preview Download |
|
md5:09377d74b933428c56b3da22aab3f94b
|
4.3 MB | Preview Download |
|
md5:a01b8367ac87a0e83274dcf3eb4ab93b
|
11.8 GB | Preview Download |
|
md5:7067c697cda6e3eb516cc1e8666359be
|
14.7 kB | Download |
Additional details
Related works
- Has part
- Preprint: arXiv:2512.01092 (arXiv)
Software
- Repository URL
- https://github.com/sophisid/PG-SB
- Programming language
- Python