Published October 26, 2018 | Version v.1.0
Dataset Open

FrameNet Semantic Frame Disambiguation with CrowdTruth

  • 1. Vrije Universiteit Amsterdam
  • 2. Google

Description

This repository contains a ground truth corpus for semantic frame disambiguation, acquired with crowdsourcing and processed with CrowdTruth metrics that capture ambiguity in annotations by measuring inter-annotator disagreement.

The dataset contains annotations for 433 sentence-word pairs from the FrameNet corpus v.1.7, with each sentence-word pair annotated for frame disambiguation by 15 workers. The crowdsourced data was collected from Amazon Mechanical Turk.

The corpus has been referenced in the following paper:

To replicate the data processing from the paper, use the Jupyter Notebook file CrowdTruth metrics.ipynb. It requires the installation of the CrowdTruth metrics Python package (v >= 2.0).

The data aggregated with CrowdTruth metrics is available in folder data/output/

The raw crowdsourcing data is available in folder data/input/

If you find this data useful in your research, please consider citing:

@inproceedings{dumitrache2018frames,
  Author = {Anca Dumitrache and Lora Aroyo and Chris Welty},
  Title = {Capturing Ambiguity in Crowdsourcing Frame Disambiguation},
  Booktitle = {The sixth AAAI Conference on Human Computation and Crowdsourcing},
  Year = {2018}
}

Files

CrowdTruth/FrameDisambiguation-v.1.0.zip

Files (3.2 MB)

Name Size Download all
md5:469e846309e70004ef985444e5a99b02
3.2 MB Preview Download

Additional details

References

  • Anca Dumitrache, Lora Aroyo and Chris Welty: Capturing and Interpreting Ambiguity in Crowdsourcing Frame Disambiguation. HCOMP 2018. arXiv:1805.00270