Published June 4, 2025 | Version v3
Dataset Open

Synthetic RDF Data

Description

This dataset contains synthetic RDF data generated as part of my master's thesis research. The data was generated based on SHACL (Shapes Constraint Language) shapes that define the structure and constraints of RDF graphs.

Two different generative models were used:

  • GAN (Generative Adversarial Network): Used to model and sample property values by learning the distribution of entities and relationships.

  • VAE (Variational Autoencoder): Used to capture the latent distribution of data features and generate new, realistic RDF instances while preserving SHACL constraints.

The primary objective was to produce high-quality, diverse synthetic knowledge graph data that:

  • Adheres to SHACL constraints

  • Represents realistic distributions

  • Is suitable for testing RDF-based systems and knowledge graph pipelines

Files

ProteinOntologyShapes.ttl.txt

Files (9.3 MB)

Name Size Download all
md5:6e770cc1f7364e97bff7a6555a2d172e
4.5 MB Download
md5:cae780da2059349f265efa2188c1e1fa
41.5 kB Download
md5:1c10b43fbc8a3d47e37b3081aa05b844
2.8 kB Preview Download
md5:0d8be880bd190dd6de0182e42f7c77a8
4.8 MB Download
md5:f1c037717efd71bdc944be84f271075f
3.0 kB Download

Additional details

Dates

Available
2025-06-04