There is a newer version of the record available.

Published October 20, 2022 | Version 1.1

InductiveQE Datasets

Authors/Creators

  • 1. Mila, McGill University

Description

InductiveQE datasets

This repository contains 10 inductive complex query answering datasets published in "Inductive Logical Query Answering in Knowledge Graphs" (NeurIPS 2022). 9 datasets (106-550) were created from FB15k-237, the wikikg dataset was created from OGB WikiKG 2 graph. In the datasets, all inference graphs extend training graphs and include new nodes and edges. Dataset numbers indicate a relative size of the inference graph compared to the training graph, e.g., in 175, the number of nodes in the inference graph is 175% compared to the number of nodes in the training graph. The higher the ratio, the more new unseen nodes appear at inference time, the more complex the task is. The Wikikg split has a fixed 133% ratio.

Each dataset is a zip archive containing 17 files:

  • train_graph.txt (pt for wikikg) - original training graph
  • val_inference.txt (pt) - inference graph (validation split), new nodes in validation are disjoint with the test inference graph
  • val_predict.txt (pt) - missing edges in the validation inference graph to be predicted. 
  • test_intference.txt (pt) - inference graph (test splits), new nodes in test are disjoint with the validation inference graph
  • test_predict.txt (pt) - missing edges in the test inference graph to be predicted.
  • train/valid/test_queries.pkl - queries of the respective split, 14 query types for fb-derived datasets, 9 types for Wikikg (EPFO-only)
  • *_answers_easy.pkl - easy answers to respective queries that do not require predicting missing links but only edge traversal
  • *_answers_hard.pkl - hard answers to respective queries that DO require predicting missing links and against which the final metrics will be computed
  • train_answers_val.pkl - the extended set of answers for training queries on the bigger validation graph, most of training queries have at least 1 more new answers. This is supposed to be an inference-only dataset to measure faithfulness of trained models
  • train_answers_test.pkl - the extended set of answers for training queries on the bigger test graph, most of training queries have at least 1 more new answers. This is supposed to be an inference-only dataset to measure faithfulness of trained models
  • og_mappings.pkl - contains entity2id / relation2id dictionaries mapping local node/relation IDs from a respective dataset to the original fb15k237 / wikikg2
  • stats.txt - a small file with dataset stats

Overall unzipped size of all datasets combined is about 10 GB. Please refer to the paper for the sizes of graphs and the number of queries per graph.

The Wikikg dataset is supposed to be evaluated in the inference-only regime being pre-trained solely on simple link prediction, the number of training complex queries is not enough for such a large dataset.

Paper pre-print: https://arxiv.org/abs/2210.08008

The full source code of training/inference models is available at https://github.com/DeepGraphLearning/InductiveQE

UPD 1.1: Added train_answers_val.pkl files to all freebase-derived datasets - answers of training queries on larger validation graphs

 

 

Files

106.zip

Files (4.6 GB)

Name Size
md5:a2f5b9791ef954e03297abcdf18c918d
565.5 MB Preview Download
md5:b6d0d74898c572d3c0135782d183bc32
510.3 MB Preview Download
md5:6da27515ee6596719e7128920b8ed89c
511.8 MB Preview Download
md5:b78a6594d7736a1d896518571127bcb3
609.5 MB Preview Download
md5:08a8002d48f9431c10455578e0010dd1
654.1 MB Preview Download
md5:b8385353b23e19d1edc4303ed89c37b8
576.9 MB Preview Download
md5:8dd07646d22deaae7a6886396b27e665
441.0 MB Preview Download
md5:3cca70cde46b6f5259692d2c51728504
350.3 MB Preview Download
md5:b8b3bfbca38fa4fea541df3239658917
221.8 MB Preview Download
md5:6fa28e09a349075926c4b0b72a1eb5e6
162.2 MB Preview Download

Additional details

References

  • Galkin, Mikhail et al. Inductive Logical Query Answering in Knowledge Graphs. NeurIPS 2022