InductiveQE Datasets

Galkin, Mikhail

doi:10.5281/zenodo.7306046

Published October 20, 2022 | Version 2.0

Dataset Open

InductiveQE Datasets

Galkin, Mikhail¹

1. Mila, McGill University

InductiveQE datasets

UPD 2.0: Regenerated datasets free of potential test set leakages

UPD 1.1: Added train_answers_val.pkl files to all freebase-derived datasets - answers of training queries on larger validation graphs

This repository contains 10 inductive complex query answering datasets published in "Inductive Logical Query Answering in Knowledge Graphs" (NeurIPS 2022). 9 datasets (106-550) were created from FB15k-237, the wikikg dataset was created from OGB WikiKG 2 graph. In the datasets, all inference graphs extend training graphs and include new nodes and edges. Dataset numbers indicate a relative size of the inference graph compared to the training graph, e.g., in 175, the number of nodes in the inference graph is 175% compared to the number of nodes in the training graph. The higher the ratio, the more new unseen nodes appear at inference time, the more complex the task is. The Wikikg split has a fixed 133% ratio.

Each dataset is a zip archive containing 17 files:

train_graph.txt (pt for wikikg) - original training graph
val_inference.txt (pt) - inference graph (validation split), new nodes in validation are disjoint with the test inference graph
val_predict.txt (pt) - missing edges in the validation inference graph to be predicted.
test_intference.txt (pt) - inference graph (test splits), new nodes in test are disjoint with the validation inference graph
test_predict.txt (pt) - missing edges in the test inference graph to be predicted.
train/valid/test_queries.pkl - queries of the respective split, 14 query types for fb-derived datasets, 9 types for Wikikg (EPFO-only)
*_answers_easy.pkl - easy answers to respective queries that do not require predicting missing links but only edge traversal
*_answers_hard.pkl - hard answers to respective queries that DO require predicting missing links and against which the final metrics will be computed
train_answers_val.pkl - the extended set of answers for training queries on the bigger validation graph, most of training queries have at least 1 more new answers. This is supposed to be an inference-only dataset to measure faithfulness of trained models
train_answers_test.pkl - the extended set of answers for training queries on the bigger test graph, most of training queries have at least 1 more new answers. This is supposed to be an inference-only dataset to measure faithfulness of trained models
og_mappings.pkl - contains entity2id / relation2id dictionaries mapping local node/relation IDs from a respective dataset to the original fb15k237 / wikikg2
stats.txt - a small file with dataset stats

Overall unzipped size of all datasets combined is about 10 GB. Please refer to the paper for the sizes of graphs and the number of queries per graph.

The Wikikg dataset is supposed to be evaluated in the inference-only regime being pre-trained solely on simple link prediction, the number of training complex queries is not enough for such a large dataset.

Paper pre-print: https://arxiv.org/abs/2210.08008

The full source code of training/inference models is available at https://github.com/DeepGraphLearning/InductiveQE

Files

106.zip

Files (5.2 GB)

Name	Size	Download all
106.zip md5:6f9a1dcf22108074fb94a05b8377a173	609.5 MB	Preview Download
113.zip md5:e4ea60448e918c62779cfa757a096aa9	552.8 MB	Preview Download
122.zip md5:272d2cc1e3f98f76d02daaf066f9d653	549.2 MB	Preview Download
134.zip md5:cd8028c9674dc81f38cd17b03af43fe1	707.7 MB	Preview Download
150.zip md5:61b545de8e5cdb04832f27842d8c0175	741.5 MB	Preview Download
175.zip md5:29ee1dbed7662740a2f001a0c6df8911	660.4 MB	Preview Download
217.zip md5:9fde4563c619dc4d2b81af200cf7bc6b	440.9 MB	Preview Download
300.zip md5:4db5c172acf83f676c9cf6589e033d7e	400.4 MB	Preview Download
550.zip md5:e78bb9a7de9bd55813bb17f57941303c	269.6 MB	Preview Download
wikikg.zip md5:fa30b189436ab46a2ff083dd6a5e6e0b	219.8 MB	Preview Download

Additional details

Galkin, Mikhail et al. Inductive Logical Query Answering in Knowledge Graphs. NeurIPS 2022

	All versions	This version
Views	748	402
Downloads	1,175	908
Data volume	767.5 GB	646.2 GB

InductiveQE Datasets

Creators

Description

Files

106.zip

Files (5.2 GB)

Additional details

References