Published January 14, 2022 | Version v1
Dataset Open

Dataset - Clustering Semantic Predicates in the Open Research Knowledge Graph

Authors/Creators

Contributors

Description

This dataset has been created for implementing a content-based recommender system in the context of the Open Research Knowledge Graph (ORKG). The recommender system accepts research paper's title and abstracts as input and recommends existing predicates in the ORKG semantically relevant to the given paper.

 

The paper instances in the dataset are grouped by ORKG comparisons and therefore the data.json file is more comprehensive than training_set.json and test_set.json.

 

data.json

The main JSON object consists of a list of comparisons. Each comparisons object has an ID, label, list of papers and list of predicates, whereas each paper object has ID, label, DOI, research field, research problems and abstract. Each predicate object has an ID and a label. See an example instance below.

{
    "comparisons": [
        {
            "id": "R108331",
            "label": "Analysis of approaches based on required elements in way of modeling",
            "papers": [
                {
                    "id": "R108312",
                    "label": "Rapid knowledge work visualization for organizations",
                    "doi": "10.1108/13673270710762747",
                    "research_field": {
                        "id": "R134",
                        "label": "Computer and Systems Architecture"
                    },
                    "research_problems": [
                        {
                            "id": "R108294",
                            "label": "Enterprise engineering"
                        }
                    ],
                    "abstract": "Purpose \u2013 The purpose of this contribution is to motivate a new, rapid approach to modeling knowledge work in organizational settings and to introduce a software tool that demonstrates the viability of the envisioned concept.Design/methodology/approach \u2013 Based on existing modeling structures, the KnowFlow toolset that aids knowledge analysts in rapidly conducting interviews and in conducting multi\u2010perspective analysis of organizational knowledge work is introduced.Findings \u2013 This article demonstrates how rapid knowledge work visualization can be conducted largely without human modelers by developing an interview structure that allows for self\u2010service interviews. Two application scenarios illustrate the pressing need for and the potentials of rapid knowledge work visualizations in organizational settings.Research limitations/implications \u2013 The efforts necessary for traditional modeling approaches in the area of knowledge management are often prohibitive. This contribution argues that future research needs ..."
                },
          ....
          ],
          "predicates": [
                {
                    "id": "P37126",
                    "label": "activities, behaviours, means [for knowledge development and/or for knowledge conveyance and transformation"
                },
                {
                    "id": "P36081",
                    "label": "approach name"
                },
          ....
          ]
      },
  ....
  ]
}

 

training_set.json and test_set.json

The main JSON object consists of a list of training/test instances. Each instance has an instance_id with the format (comparison_id X paper_id) and a text. The text is a concatenation of the paper's label (title) and abstract. See an example instance below.

Note that test instances are not duplicated and do not occur in the training set. Training instances are also not duplicated, BUT training papers can be duplicated in a concatenation with different comparisons.

{
    "instances": [
        {
            "instance_id": "R108331xR108301",
            "comparison_id": "R108331",
            "paper_id": "R108301",
            "text": "A notation for Knowledge-Intensive Processes Business process modeling has become essential for managing organizational knowledge artifacts. However, this is not an easy task, especially when it comes to the so-called Knowledge-Intensive Processes (KIPs). A KIP comprises activities based on acquisition, sharing, storage, and (re)use of knowledge, as well as collaboration among participants, so that the amount of value added to the organization depends on process agents' knowledge. The previously developed Knowledge Intensive Process Ontology (KIPO) structures all the concepts (and relationships among them) to make a KIP explicit. Nevertheless, KIPO does not include a graphical notation, which is crucial for KIP stakeholders to reach a common understanding about it. This paper proposes the Knowledge Intensive Process Notation (KIPN), a notation for building knowledge-intensive processes graphical models."
        },
     ...
     ]
}

 

Dataset Statistics:

- Papers Predicates Research Fields Research Problems
Min/Comparison 2 2 1 0
Max/Comparison 202 112 5 23
Avg./Comparison 21,54 12,79 1,20 1,09
Total 4060 1816 46 178

Dataset Splits:

- Papers Comparisons
Training Set 2857 214
Test Set 1203 180

 

Files

data.json

Files (15.7 MB)

Name Size Download all
md5:5d6130fcec50dfb7ced63da5666a4eac
9.2 MB Preview Download
md5:5e53429fab7f0837acfd5c25f42ea499
1.8 MB Preview Download
md5:ba8f9ca8afe39f47f3df6fc2f95d9a65
4.7 MB Preview Download