Dataset Open Access
Nicholas Botzer; Yifan Ding; Tim Weninger
An entity linking dataset created from the social media website, Reddit. The dataset contains 619 posts and 1,243 corresponding comments that were selected and given to human annotators. Three different human annotators were used to annotate each grouping of text. The resulting mentions and entities are included with a breakdown of the inter-annotator agreement between the various mention-entity pairs.
The mention-entity pairs collected are broken into groups based on the level of inter-annotator agreement.
Gold annotations - all three agree
Silver annotations - two out of three annotators agree
Bronze annotations - an individual annotator's annotation that the other two did not have
In total the dataset contains 1,342 gold annotations, 2,723 silver annotations, and 7,038 bronze annotations.
A readme file is provided that describes the structure of the files and the information within each one.