Using Context Awareness to Improve Domain-Specific Named Entity Disambiguation

Filipponi, Matteo

doi:10.5281/zenodo.2540682

Published January 15, 2019 | Version v1

Other Open

Using Context Awareness to Improve Domain-Specific Named Entity Disambiguation

Filipponi, Matteo¹

1. EPFL

Contributors

Supervisor (2):

1. EPFL

In this project we designed and implemented a system based on the Learning To Rank framework to perform Named Entity Disambiguation (NED) of ancient author names and work titles being parts of canonical bibliographic citations. The data is made of abstracts extracted from modern publications in the context of Classical Studies.

We had to deal with domain specific challenges like the small set of available anno- tated data, the high level of ambiguity of the citations and a specific knowledge base which does not include the common properties of the knowledge bases usually used in state-of-the-art NED systems like Wikipedia.

Finally our system improved the already implemented baseline system and reached a F1 score of 77.62% (+7.1%) and 71.88% accuracy (+10.2%). We also demonstrated how we can further improve the disambiguation by exploiting the co-occurrence probability of entities extracted from the corpus. With this method we improved our system by 6.8% in terms of accuracy on a sub-set of 59 documents.

Files