MetaLink - Closure and Error Degree of 556M owl:sameAs statements
MetaLink is a dataset that contains metadata for a very large set of owl:sameAs links that are crawled from the LOD Cloud. MetaLink encodes a previously published error metric for each of these links [Raad et al., 2018]. This error degree ranges from 0.0 (most likely correct) till 1.0 (most likely incorrect). The idea is that the more an owl:sameAs link is isolated in the network (of all owl:sameAs links), the higher error degree this link will have. Experiments shows that discarding the 1M owl:sameAs links with an error degree >0.99 can significantly increase the quality of the transitive closure. Also by keeping only the 400M owl:sameAs links with error degree <= 0.4, the resulting closure is 100% precise in several manually evaluated cases. The resulted equivalence classes from these different closures are publicly available online.
MetaLink is published in combination with LOD-a-lot, a dataset that is based on a very large crawl of a subset of the LOD Cloud. By combining MetaLink and LOD-a-lot, applications are able to make informed decisions about whether or not to follow specific links on the LOD Cloud. This dataset contains 4,352,602,452 unique triples, and is available in HDT (Header Dictionary Triples) format. It can be navigated online using the TriplyDB Linked Data hosting platform: https://krr.triply.cc/krr/metalink.
A figure describing the vocabulary of the MetaLink dataset can be found here. Classes are displayed by circles and properties are displayed by arcs. The MetaLink-specific classes and properties are displayed in red, the blue classes and properties are reused from existing vocabularies.