Dataset Open Access

Reddit Entity Linking

Nicholas Botzer; Yifan Ding; Tim Weninger

An entity linking dataset created from the social media website, Reddit. The dataset contains 619 posts and 1,243 corresponding comments that were selected and given to human annotators. Three different human annotators were used to annotate each grouping of text.  The resulting mentions and entities are included with a breakdown of the inter-annotator agreement between the various mention-entity pairs.

The mention-entity pairs collected are broken into groups based on the level of inter-annotator agreement. 

Gold annotations - all three agree

Silver annotations - two out of three annotators agree

Bronze annotations - an individual annotator's annotation that the other two did not have

In total the dataset contains 1,342 gold annotations, 2,723 silver annotations, and 7,038 bronze annotations.

A readme file is provided that describes the structure of the files and the information within each one.

Files (343.0 kB)
Name Size
reddit_el.zip
md5:eea345cc7574a5c9c376748d1871d557
343.0 kB Download
867
421
views
downloads
All versions This version
Views 867579
Downloads 421268
Data volume 107.5 MB91.9 MB
Unique views 712532
Unique downloads 252213

Share

Cite as