Dataset - Clustering Semantic Predicates in the Open Research Knowledge Graph

Arab Oghli, Omar

doi:10.5281/zenodo.6513499

Published January 14, 2022 | Version v1

Dataset Open

Dataset - Clustering Semantic Predicates in the Open Research Knowledge Graph

Arab Oghli, Omar

Contributors

Supervisors:

This dataset has been created for implementing a content-based recommender system in the context of the Open Research Knowledge Graph (ORKG). The recommender system accepts research paper's title and abstracts as input and recommends existing predicates in the ORKG semantically relevant to the given paper.

The paper instances in the dataset are grouped by ORKG comparisons and therefore the data.json file is more comprehensive than training_set.json and test_set.json.

data.json

The main JSON object consists of a list of comparisons. Each comparisons object has an ID, label, list of papers and list of predicates, whereas each paper object has ID, label, DOI, research field, research problems and abstract. Each predicate object has an ID and a label. See an example instance below.

{
    "comparisons": [
        {
            "id": "R108331",
            "label": "Analysis of approaches based on required elements in way of modeling",
            "papers": [
                {
                    "id": "R108312",
                    "label": "Rapid knowledge work visualization for organizations",
                    "doi": "10.1108/13673270710762747",
                    "research_field": {
                        "id": "R134",
                        "label": "Computer and Systems Architecture"
                    },
                    "research_problems": [
                        {
                            "id": "R108294",
                            "label": "Enterprise engineering"
                        }
                    ],
                    "abstract": "Purpose \u2013 The purpose of this contribution is to motivate a new, rapid approach to modeling knowledge work in organizational settings and to introduce a software tool that demonstrates the viability of the envisioned concept.Design/methodology/approach \u2013 Based on existing modeling structures, the KnowFlow toolset that aids knowledge analysts in rapidly conducting interviews and in conducting multi\u2010perspective analysis of organizational knowledge work is introduced.Findings \u2013 This article demonstrates how rapid knowledge work visualization can be conducted largely without human modelers by developing an interview structure that allows for self\u2010service interviews. Two application scenarios illustrate the pressing need for and the potentials of rapid knowledge work visualizations in organizational settings.Research limitations/implications \u2013 The efforts necessary for traditional modeling approaches in the area of knowledge management are often prohibitive. This contribution argues that future research needs ..."
                },
          ....
          ],
          "predicates": [
                {
                    "id": "P37126",
                    "label": "activities, behaviours, means [for knowledge development and/or for knowledge conveyance and transformation"
                },
                {
                    "id": "P36081",
                    "label": "approach name"
                },
          ....
          ]
      },
  ....
  ]
}

training_set.json and test_set.json

The main JSON object consists of a list of training/test instances. Each instance has an instance_id with the format (comparison_id X paper_id) and a text. The text is a concatenation of the paper's label (title) and abstract. See an example instance below.

Note that test instances are not duplicated and do not occur in the training set. Training instances are also not duplicated, BUT training papers can be duplicated in a concatenation with different comparisons.

{
    "instances": [
        {
            "instance_id": "R108331xR108301",
            "comparison_id": "R108331",
            "paper_id": "R108301",
            "text": "A notation for Knowledge-Intensive Processes Business process modeling has become essential for managing organizational knowledge artifacts. However, this is not an easy task, especially when it comes to the so-called Knowledge-Intensive Processes (KIPs). A KIP comprises activities based on acquisition, sharing, storage, and (re)use of knowledge, as well as collaboration among participants, so that the amount of value added to the organization depends on process agents' knowledge. The previously developed Knowledge Intensive Process Ontology (KIPO) structures all the concepts (and relationships among them) to make a KIP explicit. Nevertheless, KIPO does not include a graphical notation, which is crucial for KIP stakeholders to reach a common understanding about it. This paper proposes the Knowledge Intensive Process Notation (KIPN), a notation for building knowledge-intensive processes graphical models."
        },
     ...
     ]
}

Dataset Statistics:

-	Papers	Predicates	Research Fields	Research Problems
Min/Comparison	2	2	1	0
Max/Comparison	202	112	5	23
Avg./Comparison	21,54	12,79	1,20	1,09
Total	4060	1816	46	178

Dataset Splits:

-	Papers	Comparisons
Training Set	2857	214
Test Set	1203	180

Files

data.json

Files (15.7 MB)

Name	Size	Download all
data.json md5:5d6130fcec50dfb7ced63da5666a4eac	9.2 MB	Preview Download
test_set.json md5:5e53429fab7f0837acfd5c25f42ea499	1.8 MB	Preview Download
training_set.json md5:ba8f9ca8afe39f47f3df6fc2f95d9a65	4.7 MB	Preview Download

	All versions	This version
Views	468	468
Downloads	106	106
Data volume	597.1 MB	597.1 MB

Dataset - Clustering Semantic Predicates in the Open Research Knowledge Graph

Authors/Creators

Contributors

Supervisors:

Description

Files

data.json

Files (15.7 MB)