Published December 19, 2020 | Version v1
Dataset Open

kgbench: dblp

  • 1. Vrije Universiteit Amsterdam

Description

Graph neural networks and other machine learning models offer a promising direction for interpretable machine learning on relational and multimodal data. Until now, however, progress in this area is difficult to gauge. This is primarily due to a limited number of datasets with (a) a high enough number of labeled nodes in the test set for precise measurement of performance, and (b) a rich enough variety of of multimodal information to learn from. Here, we introduce a set of new benchmark tasks for node classification on knowledge graphs. We focus primarily on node classification, since this setting cannot be solved purely by node embedding models, instead requiring the model to pool information from several steps away in the graph. However, the datasets may also be used for link prediction. For each dataset, we provide test and validation sets of at least 1000 instances, with some containing more than 10\;000 instances. Each task can be performed in a purely relational manner, to evaluate the performance of a relational graph model in isolation, or with multimodal information, to evaluate the performance of multimodal relational graph models. All datasets are packaged in a CSV format that is easily consumable in any machine learning environment, together with the original source data in RDF and pre-processing code for full provenance. We provide code for loading the data into \texttt{numpy} and \texttt{pytorch}. We compute performance for several baseline models.

Files

dblp.zip

Files (332.8 MB)

Name Size Download all
md5:513232f6aec6d44104a2fb21c25445de
332.8 MB Preview Download