Published December 19, 2020 | Version 0.1
Dataset Open

kgbench: amplus

  • 1. Vrije Universiteit Amsterdam

Description

Message passing models for machine learning on knowledge graphs offer a promising direction for interpretable machine learning on relational and multimodal data. Until now, however, progress in this area is difficult to gauge. This is primarily due to a limited number of datasets with (a) a high enough number of labeled nodes in the test set for precise measurement of performance, and (b) a rich enough variety of of multimodal information to learn from. Here, we introduce a set of new benchmark tasks for node classification and node regression on knowledge graphs. We focus primarily on node labeling tasks, since this setting cannot be solved purely by node embedding, requiring the model to pool information from several steps away in the graph. However, each dataset may also be used for link prediction. For each dataset, we provide test and validation sets of at least 1000 instances, with some containing more than 10\;000 instances. Each task can be performed in a purely relational manner, to evaluate the performance of a relational graph model in isolation, or with mutimodal information, to evaluate the performance of multimodal relational graph models. All datasets are packaged in a CSV format that is easily consumable in any machine learning environment, together with the original source data in RDF and pre-processing code for full provenance. We provide code for loading the data into \texttt{numpy} and \texttt{pytorch}. We compute performance for several baseline models.

Files

amplus.zip

Files (1.8 GB)

Name Size Download all
md5:a78696918e7a3c0642ec9ad76fdc9d35
1.8 GB Preview Download
md5:c84e2c7bb60b13f7878276100f6f83ba
20.1 kB Preview Download