There is a newer version of the record available.

Published December 3, 2025 | Version v1
Dataset Open

PG-SB: A Benchmark for Schema Discovery in Property Graphs

  • 1. EDMO icon Foundation for Research and Technology Hellas, Institute of Computer Science
  • 2. ROR icon Centre National de la Recherche Scientifique
  • 3. ROR icon Laboratoire d'Informatique de Grenoble
  • 4. ROR icon Harokopio University of Athens
  • 5. ROR icon Foundation for Research and Technology Hellas
  • 6. ROR icon University of Crete
  • 1. EDMO icon Foundation for Research and Technology Hellas, Institute of Computer Science
  • 2. ROR icon Laboratoire d'Informatique de Grenoble
  • 3. ROR icon Centre National de la Recherche Scientifique
  • 4. ROR icon Harokopio University of Athens
  • 5. ROR icon Foundation for Research and Technology Hellas
  • 6. ROR icon University of Crete

Description

PG-SB unifies ten datasets (five real, five synthetic) covering the domains of social networks, neuroscience connectomes, biomedicine, finance/leaks, communications, stream analytics, and internet measurements. 
For each dataset, we provide the ground-truth schema and enumerate the corresponding type patterns (node/edge patterns) observed in the data, capturing the structural variability of label and property co-occurrence. 
The benchmark  includes a noise injection framework that (i) randomly removes 0-40 % of node/edge properties and (ii) varies label availability across three settings: 100% (all labels retained), 50% (half retained), and 0% (no labels), summing up to 150 test cases.

(All dataset resources are inside the datasets.zip)

Files

dataset_info.pdf

Files (11.8 GB)

Name Size Download all
md5:578151033be0d7662b0fdee17940797e
15.1 kB Download
md5:833f68cab27269d7d37e033dc0b21710
97 Bytes Preview Download
md5:742698a2a8306f14fa4dced4d6d7f332
385 Bytes Preview Download
md5:09377d74b933428c56b3da22aab3f94b
4.3 MB Preview Download
md5:a01b8367ac87a0e83274dcf3eb4ab93b
11.8 GB Preview Download
md5:7067c697cda6e3eb516cc1e8666359be
14.7 kB Download

Additional details

Related works

Has part
Preprint: arXiv:2512.01092 (arXiv)

Software

Repository URL
https://github.com/sophisid/PG-SB
Programming language
Python