Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published May 7, 2023 | Version v1
Dataset Open

VOYAGE: A Large Collection of Vocabulary Usage in Open RDF Datasets

  • 1. Nanjing University
  • 2. University of Edinburgh

Description

List of files:

  • odps.json: for each of the accessed ODPs, its name, URL, API type, API URL, and the IDs of RDF datasets collected from it
    • JSON structure: a list of objects, where each object contains the following attributes - 'name' (string), 'URL' (string), 'API type' (string), 'API URL' (string), and 'collected datasets IDs' (list of integers)

  • datasets.json: for each of the crawled RDF datasets, its ID, title, description, author, license, dump file URLs, and PLDs
    • JSON structure: a list of objects, where each object contains the following attributes - 'ID' (integer), 'title' (string), 'description' (string), 'author' (string), 'license' (string), 'dump file URLs' (list of strings), and 'PLDs' (list of strings)

  • deduplicated_datasets.json: the IDs of the deduplicated RDF datasets and whether they are in the LOD Cloud
    • JSON structure: a list of objects, where each object contains the following attributes - 'ID' (integer) and 'in LOD Cloud' (boolean)

  • terms.json: the extracted classes, properties, and the IDs of RDF datasets using each term
    • JSON structure: a list of objects, where each object contains the following attributes - 'term' (string), 'is class' (boolean), 'is property' (boolean), and 'used in dataset IDs' (list of integers)

  • vocabularies.json: the extracted vocabularies, the classes and properties in each vocabulary, and the IDs of RDF datasets using each vocabulary
    • JSON structure: a list of objects, where each object contains the following attributes - 'vocabulary' (string), 'classes' (list of strings), 'properties' (list of strings), and 'used in dataset IDs' (list of integers).

  • edps.json: the extracted distinct EDPs and the IDs of RDF datasets using each EDP
    • JSON structure: a list of objects, where each object contains the following attributes - 'classes' (list of strings), 'forward properties' (list of strings), 'backward properties' (list of strings), and 'used in dataset IDs' (list of integers)

  • clusters.json: the clusters of vocabularies generated by MV-ITCC and LDA
    • JSON structure: {"LDA": {"vocabularies": {VOCABULARY_CLUSTER_ID_1: [LIST_OF_VOCABULARIES], VOCABULARY_CLUSTER_ID_2: [LIST_OF_VOCABULARIES], ...}}, "MV-ITCC": {"vocabularies": {VOCABULARY_CLUSTER_ID_1: [LIST_OF_VOCABULARIES], VOCABULARY_CLUSTER_ID_2: [LIST_OF_VOCABULARIES], ...}, "dataset IDs": {DATASET_CLUSTER_ID_1: [LIST_OF_DATASET_IDS], DATASET_CLUSTER_ID_2: [LIST_OF_DATASET_IDS], ...}}}

Files

clusters.json

Files (4.1 GB)

Name Size Download all
md5:b4182e60ac956e760f6821382eacbd5a
42.2 kB Preview Download
md5:23d648e273d65709743900cb7daef6cb
42.6 MB Preview Download
md5:b1574619a57da24c9574f6fbaea56325
2.6 MB Preview Download
md5:81ef7df63bd7f9d23a418a356eee685c
3.1 GB Preview Download
md5:24737105ab6a8c9bc23a60484d452a2b
628.5 kB Preview Download
md5:d951de0f77c58ef7911dd222eb4b7172
507.7 MB Preview Download
md5:67bb0f1ff79a218525c7cfd4f1d89bc6
446.1 MB Preview Download