Cross-lingual linking of multi-word entities and language-dependent learning of multi-word entity patterns
- 1. European Commission, Joint Research Centre, Ispra, Italy
Description
We address large-scale multilingual multi-word entity (MWEntity) recognition and
variant matching. Firstly, we recognise MWEntities in 22 different languages, iden-
tify monolingual variant spellings and link equivalent groups of variants across all
languages. We then use the previously recognised MWEntities to learn new recog-
nition rules based on distributional patterns. Not requiring any linguistic tools, the
method is suitable for our highly multilingual environment. When adding the new
rules to the original rule-based NER system, F1 performance for Spanish increases
from 42.4% to 50% (18% increase) and for English from 43.4% to 44.5% (2.5% in-
crease). Besides aiming at turning free text into semi-structured data for search
and for machine-processing purposes, we use the system to link related news over
time and across languages, as well as to detect trends.
Files
11.pdf
Files
(425.4 kB)
Name | Size | Download all |
---|---|---|
md5:0128a0236108503734a6fe886496ea2d
|
425.4 kB | Preview Download |