Published February 27, 2024 | Version 1.0.0
Dataset Open

NRS EN/SV: Automatically detected non-recorded word senses in English and Swedish

  • 1. ROR icon University of Stuttgart
  • 2. ROR icon University of Gothenburg
  • 3. iguanodon.ai
  • 4. ROR icon University of Geneva

Description

This data collection contains English and Swedish use-sense instances annotated with binary labels. Annotators were asked to judge whether the respective sense (gloss) describes the meaning of the target word in the respective use well. We provide the following files:

  • data/: uses, senses, instances and judgments for randomly sampled uses (phase 1) and for uses predicted to be missing from the respective dictionary (phase 2). Instances for phase 2 are missing but can be easily reconstructed by combining each use with each sense of the lemma for that use. We further provide assigned and unassigned usages aggregated over the three annotators as described in the paper below. The tutorial used for training annotators is available in the annotation_standardization repository.
  • guidelines/: the guidelines used for annotator training.

Please find more information including limitations on the data in the paper referenced below.

Version: 1.0.0, 27.02.2024.

Reference

Jonathan Lautenschlager, Emma Sköldberg, Simon Hengchen, Dominik Schlechtweg. 2024. Detection of non-recorded word senses in English and Swedish.

Files

nrs_en_sv.zip

Files (821.9 kB)

Name Size Download all
md5:ce0ca6b4798b772ec53f6f33bc775c81
821.9 kB Preview Download

Additional details

Funding

Stiftelsen Riksbankens Jubileumsfond
Change is Key! M21-0021