UPDATE: Zenodo migration postponed to Oct 13 from 06:00-08:00 UTC. Read the announcement.

Dataset Open Access

DBpedia RDF2Vec Graph Embeddings

Christensen, Martin Pekár; Lissandrini, Matteo; Hose, Katja

DBpedia graph embeddings using RDF2Vec. RDF2Vec embedding generation code can be found here and is based on a publication by Portisch et al. [1].

The embeddings dataset consists of 200-dimensional vectors of DBpedia entities (from 1/9/2021).

Figure of cosine similarities between a selected set of DBpedia entities are provided in the dataset here.


Generating Embeddings

The code for generating these embeddings can be found here.

Run the run.sh script that wraps all the necessary commmands to generate embeddings

bash run.sh

The script downloads a set of DBpedia files, which are listed in dbpedia_files.txt. It then builds a Docker image and runs a container of that image that generates the embeddings for the DBpedia graph defined by the DBpedia files.

A folder files is created containing all the downloaded DBpedia files, and a folder embeddings/dbpedia is created containing the embeddings in vectors.txt along a set of random walk files.


Run Time of Embeddings Generation

Generating embeddings can take more than a day, but it depends on the number of DBpedia files chosen to be downloaded. Following are some basic run time statistics when embeddings are generated on a 64 GB RAM, 8 cores (AMD EPYC), 1 TB SSD, 1996.221 MHz machine.

  • Total: 1 day, 8 hours, 52 minutes, 41 seconds
  • Walk generation: 0 days, 7 minutes, 24 minutes, 36 seconds
  • Training: 1 day, 1 hour, 28 minutes, 5 seconds


Parameters Used

Here is listed the parameters used to generate the embeddings provided here:

  • Number of walks per entity: 100
  • Depth (hops) per walk: 4
  • Walk generation mode: RANDOM_WALKS_DUPLICATE_FREE
  • Threads: # of processors / 2
  • Training mode: sg
  • Embeddings vector dimension: 200
  • Minimum word2vec word count: 1
  • Sample rate: 0.0
  • Training window size: 5
  • Training epochs: 5
Files (32.5 GB)
Name Size
32.5 GB Download
38.5 kB Download
  • Portisch, J., Hladik, M. and Paulheim, H., 2020. RDF2Vec Light--A Lightweight Approach for Knowledge Graph Embeddings. arXiv preprint arXiv:2009.07659.

All versions This version
Views 436221
Downloads 12885
Data volume 2.3 TB1.4 TB
Unique views 385203
Unique downloads 11174


Cite as