Published November 3, 2014 | Version v1
Preprint Open

Manual Annotation of Semi-Structured Documents for Entity-Linking

  • 1. ISTI-CNR
  • 2. Ca' Foscari University of Venice

Description

The Entity Linking (EL) problem consists in automatically linking short fragments of text within a document to en- tities in a given Knowledge Base like Wikipedia. Due to its impact in several text-understanding related tasks, EL is an hot research topic. The correlated problem of devis- ing the most relevant entities mentioned in the document, a.k.a. salient entities (SE), is also attracting increasing in- terest. Unfortunately, publicly available evaluation datasets that contain accurate and supervised knowledge about men- tioned entities and their relevance ranking are currently very poor both in number and quality. This lack makes very dif- ficult to compare different EL and SE solutions on a fair basis, as well as to devise innovative techniques that relies on these datasets to train machine learning models, in turn used to automatically link and rank entities.

In this demo paper we propose a Web-deployed tool that allows to crowdsource the creation of these datasets, by sup- porting the collaborative human annotation of semi-structured documents. The tool, called Elianto, is actually an open source framework, which provides a user friendly and re- active Web interface to support both EL and SE labelling tasks, through a guided two-step process.

Files

paper.pdf

Files (502.6 kB)

Name Size Download all
md5:7f5b424da1cfc2ae89fd987943cb8473
502.6 kB Preview Download