There is a newer version of the record available.

Published May 13, 2020 | Version v1
Dataset Open

Reddit Entity Linking

  • 1. University of Notre Dame

Description

An entity linking dataset created from the social media website, Reddit. The dataset contains 619 posts and 1,243 corresponding comments that were selected and given to human annotators. Three different human annotators were used to annotate each grouping of text.  The resulting mentions and entities are included with a breakdown of the inter-annotator agreement between the various mention-entity pairs.

The mention-entity pairs collected are broken into groups based on the level of inter-annotator agreement. 

Gold annotations - all three agree

Silver annotations - two out of three annotators agree

Bronze annotations - an individual annotator's annotation that the other two did not have

In total the dataset contains 1,343 gold annotations, 2,725 silver annotations, and 7,036 bronze annotations.

A readme file is provided that describes the structure of the files and the information within each one.

Files

readme.txt

Files (954.3 kB)

Name Size Download all
md5:39562621c5b29732e634fc5f02930012
250.2 kB Download
md5:bd090d8a14b912e3b5031a435ee9e542
143.7 kB Download
md5:c36900285f27f711fbe389d8ff0631cf
253.5 kB Download
md5:ae885f41deb3f1bcda9cff8596acc0a9
35.0 kB Download
md5:133647b133d96246980b371856e7a5dd
37.8 kB Download
md5:5574b4dc23de1f5c7b2a566a89bb8700
3.5 kB Preview Download
md5:788a31cc47f6757d17c3b9fc809dcfc2
87.0 kB Download
md5:2741e3263a857d42f54b241c97234e62
64.4 kB Download
md5:4dea279bc4b3e466cac582bf58bbeb16
79.1 kB Download