Gollum: A Gold Standard for Large Scale\\Multi Source Knowledge Graph Matching
Description
The set of Knowledge Graphs (KGs) generated with automatic and manual approaches is constantly growing.
For an integrated view and usage, an alignment between these KGs is necessary on the schema as well as instance level.
There are already approaches which try to tackle this multi source knowledge graph matching problem,
but large gold standards are missing to evaluate their effectiveness and scalability.
In particular, most existing gold standards are fairly small and can be solved by matchers which match exactly two KGs (1:1), which are the majority of existing matching systems.
We close this gap by presenting Gollum -- a gold standard for large-scale multi source knowledge graph matching with over 275,000 correspondences between 4,149 different KGs.
They originate from knowledge graphs derived by applying the DBpedia extraction framework to a large wiki farm.
Three variations of the gold standard are made available:
(1) a version with all correspondences for evaluating unsupervised matching approaches, and two versions for evaluating supervised matching: (2) one where each KG is contained both in the train and test set, and (3) one where each KG is exclusively contained in the train or the test set.
We plan to extend our KG track at the Ontology Alignment Evaluation Initiative (OAEI) to allow for matching systems
which are specifically designed to solve the multi KG matching problem.
As a first step towards this direction, we evaluate multi source matching approaches which reuse two-KG (1:1) matchers from the past OAEI.
Due to the size of the KG files, they are hosted at the institute:
http://data.dws.informatik.uni-mannheim.de/dbkwik/gollum/40K.tar (50,3 GB)
http://data.dws.informatik.uni-mannheim.de/dbkwik/gollum/all.tar (74,7 GB)
http://data.dws.informatik.uni-mannheim.de/dbkwik/gollum/gold.tar (25,3 GB)
Files
files_40K.txt
Files
(382.3 MB)
Name | Size | Download all |
---|---|---|
md5:b5e3ff6cf796ec91e3d3b9947dac2a4c
|
1.1 MB | Preview Download |
md5:775e38536eea62c0af6a64ca71944a7a
|
10.4 MB | Preview Download |
md5:b171acaaa22f0fa068c5d6c41f1bd56b
|
124.1 kB | Preview Download |
md5:60ba789347cea187e4cb2225001c4854
|
15.0 MB | Preview Download |
md5:91636c51984e214e0bc65a5def770765
|
77.8 MB | Preview Download |
md5:654fa9ce54f600c870a11116ea908bfc
|
17.6 MB | Preview Download |
md5:a80c8f462d032b2fdc9d480550820ae0
|
75.2 MB | Preview Download |
md5:a9489882b796df0bbc35a61d2831c5ea
|
48.5 kB | Preview Download |
md5:21aebfc1403656c7b5c1ea66373517b2
|
39.7 kB | Preview Download |
md5:0f080c3d343ce2bf9a73bb40ef436589
|
16.8 kB | Preview Download |
md5:f5fbbaad17f10ece13ecf225acdf8143
|
43.5 MB | Preview Download |
md5:53224850517c0793d083cb32a4d0d8f1
|
92.1 MB | Preview Download |
md5:ed2c89212d6ac67b686055156f5f6682
|
183.2 kB | Preview Download |
md5:83106845bd90733a6a9aeb44ab695082
|
682.4 kB | Preview Download |
md5:54f7b71fb4c2802b99018becfde90de4
|
48.6 MB | Preview Download |