Published September 18, 2020
| Version 1.0
Dataset
Open
WD50K
- 1. TU Dresden; Fraunhofer IAIS
- 2. Fraunhofer IAIS
- 3. Fraunhofer IAIS; University of Bonn
Description
WD50K dataset: An hyper-relational dataset derived from Wikidata statements. The dataset is constructed by the following procedure based on the [Wikidata RDF dump](https://dumps.wikimedia.org/wikidatawiki/20190801/) of August 2019: - A set of seed nodes corresponding to entities from FB15K-237 having a direct mapping in Wikidata (P646 "Freebase ID") is extracted from the dump. - For each seed node, all statements whose main object and qualifier values corresponding to wikibase:Item are extracted from the dump. - All literals are filtered out from the qualifiers of the above obtained statements. - All the entities from the dataset which have less than two mentions are dropped. The statements corresponding to the dropped entities are also dropped. - The remaining statements are randomly split into the train, test, and validation sets. - All statements from train and validation sets are removed which share the same main triple (s,p,o) with test statements. - WD50k_33, WD50k_66, WD50k_100 are then sampled from the above statements. Here 33, 66, 100 represents the amount of hyper-relational facts (statements with qualifiers) in the dataset. The table below provides some basic statistics of our dataset and its three further variations: | Dataset | Statements | w/Quals (%) | Entities | Relations | E only in Quals | R only in Quals | Train | Valid | Test | |-------------|------------|----------------|----------|-----------|-----------------|-----------------|---------|--------|--------| | WD50K | 236,507 | 32,167 (13.6%) | 47,156 | 532 | 5460 | 45 | 166,435 | 23,913 | 46,159 | | WD50K (33) | 102,107 | 31,866 (31.2%) | 38,124 | 475 | 6463 | 47 | 73,406 | 10,668 | 18,133 | | WD50K (66) | 49,167 | 31,696 (64.5%) | 27,347 | 494 | 7167 | 53 | 35,968 | 5,154 | 8,045 | | WD50K (100) | 31,314 | 31,314 (100%) | 18,792 | 279 | 7862 | 75 | 22,738 | 3,279 | 5,297 |
When using the dataset please cite:
@inproceedings{StarE,
title={Message Passing for Hyper-Relational Knowledge Graphs},
author={Galkin, Mikhail and Trivedi, Priyansh and Maheshwari, Gaurav and Usbeck, Ricardo and Lehmann, Jens},
booktitle={EMNLP},
year={2020}
}
For any further questions, please contact: mikhail.galkin@iais.fraunhofer.de
Notes
Files
WD50K.zip
Files
(7.1 MB)
Name | Size | Download all |
---|---|---|
md5:5e46e4630a4b425924efc0402d09696e
|
7.1 MB | Preview Download |