Dataset Open Access

Reddit Entity Linking

Nicholas Botzer; Yifan Ding; Tim Weninger

An entity linking dataset created from the social media website, Reddit. The dataset contains 619 posts and 1,243 corresponding comments that were selected and given to human annotators. Three different human annotators were used to annotate each grouping of text.  The resulting mentions and entities are included with a breakdown of the inter-annotator agreement between the various mention-entity pairs.

The mention-entity pairs collected are broken into groups based on the level of inter-annotator agreement. 

Gold annotations - all three agree

Silver annotations - two out of three annotators agree

Bronze annotations - an individual annotator's annotation that the other two did not have

In total the dataset contains 1,342 gold annotations, 2,723 silver annotations, and 7,038 bronze annotations.

A readme file is provided that describes the structure of the files and the information within each one.

Files (343.0 kB)
Name Size
343.0 kB Download
All versions This version
Views 867579
Downloads 421268
Data volume 107.5 MB91.9 MB
Unique views 712532
Unique downloads 252213


Cite as