Published December 11, 2019 | Version 1.0
Dataset Open

SoftwareKG

  • 1. University of Rostock
  • 2. GESIS - Leibniz Institute for the Social Sciences

Description

SoftwareKG is a knowledge graph that contains software mentions of 51,165 articles from PLoS that are tagged with the keyword "Social Science". The software mentions are extracted by use of an automated pipeline. more than 133,000 software mention were identified. The software mentions were then linked by use of their potential abbreviations and the DBpedia. The identified software mentions are then structured in the SoftwareKG together with meta data about the articles. The data is represented in an RDF/S model by using established W3C standards and vocabularies.

More information about SoftwareKG is provided at https://data.gesis.org/softwarekg/site/.

This dataset contains:

  • N-Triples file for the final SoftwareKG: software_kg.zip
  • Reference to the source code necessary to reproduce the results softwareKG
  • SoSciSoCi corpus used for training and evaluation of the NER model SoSciSoCi
  • SoSciSoCi-SSC silver standard corpus used for pre-training of the NER model SoSciSoCi-SSC

The work is described and used in the following publication:

David Schindler and Benjamin Zapilko and Frank Krüger: Investigating Software Usage in the Social Sciences: A Knowledge Graph Approach, In Proceedings of the 17th Extended Semantic Web Conference, Heraklion, Crete, Greece, May 31 - June 4 2020

Please cite this publication, when using the corpus.

The Code and all data is also available on github at: https://github.com/f-krueger/ESWC-SoftwareKG/releases/tag/v1.0

Files

ESWC-SoftwareKG.zip

Files (457.0 MB)

Name Size Download all
md5:e9bc52dd584fb9dba8c975924000e9eb
457.0 MB Preview Download