Dataset Open Access

Equivalence Classes of the LOD Cloud

Raad, Joe; Beek, Wouter; Van Harmelen, Frank; Wielemaker, Jan; Pernelle, Nathalie; Saïs, Fatiha


Citation Style Language JSON Export

{
  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.3345674", 
  "title": "Equivalence Classes of the LOD Cloud", 
  "issued": {
    "date-parts": [
      [
        2019, 
        1, 
        28
      ]
    ]
  }, 
  "abstract": "<p>This data set contains all the 49 million&nbsp;non-singleton equivalence classes resulting from the transitive closure of over 556 million owl:sameAs statements extracted from the LOD Cloud in the&nbsp;2015 LOD Laundromat crawl. These equivalence classes are the result of the transitive closure&nbsp;of the owl:sameAs links available in the <a href=\"http://sameas.cc\">sameAs.cc</a> data set.</p>\n\n<p>We represent these non-singleton equivalence classes using two CSV files:</p>\n\n<p>1. id2terms.csv: contains in the first column the&nbsp;equivalence class identifier (randomly generated number) and in the rest of the columns all IRIs belonging to this equivalence class, which theoretically should refer to the same real world entity. In the following, we present&nbsp;an example of one row of this file, where &quot;42467584&quot; in the first column represents the ID of this equivalence class, and the 4 other columns represent the IRIs that are identical after transitive closure:&nbsp;</p>\n\n<blockquote>\n<p>42467584 &lt;http://nl.dbpedia.org/resource/Cnodocentron_trilineatum&gt; &lt;http://sv.dbpedia.org/resource/Cnodocentron_trilineatum&gt; &lt;http://vi.dbpedia.org/resource/Cnodocentron_trilineatum&gt; &lt;http://www.wikidata.org/entity/Q2304468&gt;</p>\n</blockquote>\n\n<p>2.&nbsp;terms2id.csv: contains two columns, representing a mapping between each IRI in the sameAs.cc data set involved in a owl:sameAs link with the equivalence class it belongs to. In the following, we present an example of one&nbsp;row in this file:</p>\n\n<blockquote>\n<p>&lt;http://nl.dbpedia.org/resource/Cnodocentron_trilineatum&gt; 42467584</p>\n</blockquote>\n\n<p>In addition to the closure of all owl:sameAs links (available in the folder <em><strong>closure_all.zip</strong>), </em>this data set contains an additional two closures, with each closure also&nbsp;containing two CSV files with the same structure as presented above. These two additional closures are the following:</p>\n\n<p><strong>-&nbsp; <em>closure_099.zip</em>&nbsp;</strong>represents the closure of all&nbsp;owl:sameAs links in the sameAs.cc data set after discarding around 1 million&nbsp;probably erroneous owl:sameAs links (with error degree &gt;0.99). This error degree is computed based on the community structure of the network, described&nbsp;in the&nbsp;approach of&nbsp;<a href=\"https://www.cs.vu.nl/~frankh/postscript/ISWC2018.pdf\">[Raad et al., 2018]</a>.</p>\n\n<p><strong>-&nbsp; </strong><em><strong>closure_04.zip</strong>&nbsp;</em>represents the closure of all&nbsp;owl:sameAs links in the sameAs.cc data set after discarding around 150 million owl:sameAs links (with error degree &gt;0.4). The evaluation conducted in&nbsp;<a href=\"https://www.cs.vu.nl/~frankh/postscript/ISWC2018.pdf\">[Raad et al., 2018]</a>&nbsp;shows that the 400M owl:sameAs links with an error degree &lt;= 0.4 have higher probability of correctness compared to other links.</p>\n\n<p>The availability of these 3 different closures allows&nbsp;Linked Data practitioners for the first time to control in practice, the trade-off between (a) using more identity links, possibly not all correct, and benefiting from more contextual information from the LOD Cloud, and (b) using a smaller subset of higher quality&nbsp;identity links for limiting the risk of propagating erroneous identity links and information through the application of owl:sameAs semantics, i.e. transitive, symmetric, reflexive and property sharing.</p>", 
  "author": [
    {
      "family": "Raad, Joe"
    }, 
    {
      "family": "Beek, Wouter"
    }, 
    {
      "family": "Van Harmelen, Frank"
    }, 
    {
      "family": "Wielemaker, Jan"
    }, 
    {
      "family": "Pernelle, Nathalie"
    }, 
    {
      "family": "Sa\u00efs, Fatiha"
    }
  ], 
  "type": "dataset", 
  "id": "3345674"
}
121
22
views
downloads
All versions This version
Views 121125
Downloads 2222
Data volume 69.5 GB69.5 GB
Unique views 9899
Unique downloads 99

Share

Cite as