Exploiting multilingual lexical resources to predict MWE compositionality
Description
Semantic idiomaticity is the extent to which the meaning of a multiword expression (MWE) cannot be predicted from the meanings of its component words. Much
work in natural language processing on semantic idiomaticity has focused on compositionality prediction, wherein a binary or continuous-valued compositionality
score is predicted for an MWE as a whole, or its individual component words. One
source of information for making compositionality predictions is the translation
of an MWE into other languages. This chapter extends two previously-presented
studies – Salehi & Cook (2013) and Salehi et al. (2014) – that propose methods for
predicting compositionality that exploit translation information provided by multilingual lexical resources, and that are applicable to many kinds of MWEs in a
wide range of languages. These methods make use of distributional similarity of
an MWE and its component words under translation into many languages, as well
as string similarity measures applied to definitions of translations of an MWE and
its component words. We evaluate these methods over English noun compounds,
English verb-particle constructions, and German noun compounds. We show that
the estimation of compositionality is improved when using translations into multiple languages, as compared to simply using distributional similarity in the source
language. We further find that string similarity complements distributional similarity.
Files
13.pdf
Files
(303.9 kB)
Name | Size | Download all |
---|---|---|
md5:86cc79dff36b98bcab48011daf70c38f
|
303.9 kB | Preview Download |