Published August 3, 2020 | Version v2
Dataset Open

Reddit Entity Linking

  • 1. University of Notre Dame

Description

An entity linking dataset created from the social media website, Reddit. The dataset contains 619 posts and 1,243 corresponding comments that were selected and given to human annotators. Three different human annotators were used to annotate each grouping of text.  The resulting mentions and entities are included with a breakdown of the inter-annotator agreement between the various mention-entity pairs.

The mention-entity pairs collected are broken into groups based on the level of inter-annotator agreement. 

Gold annotations - all three agree

Silver annotations - two out of three annotators agree

Bronze annotations - an individual annotator's annotation that the other two did not have

In total the dataset contains 1,342 gold annotations, 2,723 silver annotations, and 7,038 bronze annotations.

A readme file is provided that describes the structure of the files and the information within each one.

Files

reddit_el.zip

Files (343.0 kB)

Name Size Download all
md5:eea345cc7574a5c9c376748d1871d557
343.0 kB Preview Download