VOYAGE: A Large Collection of Vocabulary Usage in Open RDF Datasets

Shi, Qing; Wang, Junrui; Pan, Jeff Z.; Cheng, Gong

doi:10.5281/zenodo.7902675

Published May 7, 2023 | Version v1

Dataset Open

VOYAGE: A Large Collection of Vocabulary Usage in Open RDF Datasets

1. Nanjing University
2. University of Edinburgh

List of files:

odps.json: for each of the accessed ODPs, its name, URL, API type, API URL, and the IDs of RDF datasets collected from it
- JSON structure: a list of objects, where each object contains the following attributes - 'name' (string), 'URL' (string), 'API type' (string), 'API URL' (string), and 'collected datasets IDs' (list of integers)
datasets.json: for each of the crawled RDF datasets, its ID, title, description, author, license, dump file URLs, and PLDs
- JSON structure: a list of objects, where each object contains the following attributes - 'ID' (integer), 'title' (string), 'description' (string), 'author' (string), 'license' (string), 'dump file URLs' (list of strings), and 'PLDs' (list of strings)
deduplicated_datasets.json: the IDs of the deduplicated RDF datasets and whether they are in the LOD Cloud
- JSON structure: a list of objects, where each object contains the following attributes - 'ID' (integer) and 'in LOD Cloud' (boolean)
terms.json: the extracted classes, properties, and the IDs of RDF datasets using each term
- JSON structure: a list of objects, where each object contains the following attributes - 'term' (string), 'is class' (boolean), 'is property' (boolean), and 'used in dataset IDs' (list of integers)
vocabularies.json: the extracted vocabularies, the classes and properties in each vocabulary, and the IDs of RDF datasets using each vocabulary
- JSON structure: a list of objects, where each object contains the following attributes - 'vocabulary' (string), 'classes' (list of strings), 'properties' (list of strings), and 'used in dataset IDs' (list of integers).
edps.json: the extracted distinct EDPs and the IDs of RDF datasets using each EDP
- JSON structure: a list of objects, where each object contains the following attributes - 'classes' (list of strings), 'forward properties' (list of strings), 'backward properties' (list of strings), and 'used in dataset IDs' (list of integers)
clusters.json: the clusters of vocabularies generated by MV-ITCC and LDA
- JSON structure: {"LDA": {"vocabularies": {VOCABULARY_CLUSTER_ID_1: [LIST_OF_VOCABULARIES], VOCABULARY_CLUSTER_ID_2: [LIST_OF_VOCABULARIES], ...}}, "MV-ITCC": {"vocabularies": {VOCABULARY_CLUSTER_ID_1: [LIST_OF_VOCABULARIES], VOCABULARY_CLUSTER_ID_2: [LIST_OF_VOCABULARIES], ...}, "dataset IDs": {DATASET_CLUSTER_ID_1: [LIST_OF_DATASET_IDS], DATASET_CLUSTER_ID_2: [LIST_OF_DATASET_IDS], ...}}}

Files

clusters.json

Files (4.1 GB)

Name	Size	Download all
clusters.json md5:b4182e60ac956e760f6821382eacbd5a	42.2 kB	Preview Download
datasets.json md5:23d648e273d65709743900cb7daef6cb	42.6 MB	Preview Download
deduplicated_datasets.json md5:b1574619a57da24c9574f6fbaea56325	2.6 MB	Preview Download
edps.json md5:81ef7df63bd7f9d23a418a356eee685c	3.1 GB	Preview Download
odps.json md5:24737105ab6a8c9bc23a60484d452a2b	628.5 kB	Preview Download
terms.json md5:d951de0f77c58ef7911dd222eb4b7172	507.7 MB	Preview Download
vocabularies.json md5:67bb0f1ff79a218525c7cfd4f1d89bc6	446.1 MB	Preview Download

	All versions	This version
Views	389	388
Downloads	211	211
Data volume	176.3 GB	176.3 GB

VOYAGE: A Large Collection of Vocabulary Usage in Open RDF Datasets

Creators

Description

Files

clusters.json

Files (4.1 GB)