Published February 27, 2019 | Version v1
Book chapter Open

Cross-lingual linking of multi-word entities and language-dependent learning of multi-word entity patterns

  • 1. European Commission, Joint Research Centre, Ispra, Italy

Description

We address large-scale multilingual multi-word entity (MWEntity) recognition and
variant matching. Firstly, we recognise MWEntities in 22 different languages, iden-
tify monolingual variant spellings and link equivalent groups of variants across all
languages. We then use the previously recognised MWEntities to learn new recog-
nition rules based on distributional patterns. Not requiring any linguistic tools, the
method is suitable for our highly multilingual environment. When adding the new
rules to the original rule-based NER system, F1 performance for Spanish increases
from 42.4% to 50% (18% increase) and for English from 43.4% to 44.5% (2.5% in-
crease). Besides aiming at turning free text into semi-structured data for search
and for machine-processing purposes, we use the system to link related news over
time and across languages, as well as to detect trends.

 

Files

11.pdf

Files (425.4 kB)

Name Size Download all
md5:0128a0236108503734a6fe886496ea2d
425.4 kB Preview Download