There is a newer version of the record available.

Published January 8, 2024 | Version v1
Dataset Open

SOMD - SOftware Mention Detection

Description

The dataset contains the training and test data for the SOftware Mention Detection challenge. The data is derived from the SoMeSci Knowledge Graph of software mentions.

  • Subtask 1 deals with the recognition of software mentions and the classification of mention (e.g. Usage, Creation,...) and software types (e.g. Application, PlugIn,...) at the same time 
  • Subtask 2 requires the recognition of additional meta data of software mentions (e.g. Version, Developer, URL,...)
  • Subtask 3 deals with extracting the relations between the different entities of interest (e.g. Version_of, License_of,...)

A detailed description of the dataset including the creation and a baseline for the different subtasks can be found in the following article 

D. Schindler, F. Bensmann, S. Dietze, and F. Krüger, “SoMeSci—A 5 Star Open Data Gold Standard Knowledge Graph of Software Mentions in Scientific Articles,” in Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM ’21), Virtual Event, QLD, Australia: Association for Computing Machinery, Nov. 2021. doi: 10.1145/3459637.3482017.

Files

subtask1.zip

Files (2.7 MB)

Name Size Download all
md5:d02a147fc9fcce98edb7775ed4cd0700
2.5 MB Preview Download
md5:b1a310f489aab605052d5edf8f0d671e
169.6 kB Preview Download
md5:07fcad7143bda315c10e9271075ff720
85.9 kB Preview Download

Additional details

Related works

Documents
Conference paper: 10.1145/3459637.3482017 (DOI)
Is variant form of
Dataset: 10.5281/zenodo.4701763 (DOI)