Published October 14, 2021 | Version v2
Dataset Open

OC-782K: Knowledge Graph of "Scientometrics" modelled according to the OpenCitations Data Model

  • 1. FIZ-Karlsruhe

Description

This dataset is a knowledge graph extracted from a triplestore covering information about the journal Scientometrics and modelled according to the OpenCitations Data Model. The original triplestore is available here. This KG was extracted for a research project on knowledge graph embeddings (KGEs) for author disambiguation. Structural triples of the knowledge graph are split into training, testing and validation for applying representation learning methods. Textual literals and numeric literals were stored separately in order to implement multimodal approaches for KGEs (see arXiv:1802.00934). For the same reason, textual literals and numeric literals are already stored into sentence embeddings and a numeric matrix respectively in the files textual_literals.npy and numeric_literals.npy. The file and_eval.json contains the evaluation dataset used for evaluating our AND architecture. For the script used to gather this dataset see the GitHub repository: https://github.com/sntcristian/and-kge/tree/main/open-citations.

Files

OC-782K.zip

Files (314.5 MB)

Name Size Download all
md5:0eadc668c9584c9a1031aefdc07b041c
230.7 MB Preview Download
md5:9569339946d814c37659289912aa8701
14.4 MB Preview Download
md5:3bffa2611641ab032dda4540d40b29b1
55.6 MB Preview Download
md5:bb75b9668301c5b50d22fee09f35db47
13.8 MB Preview Download