Quantifying Synthesis and Fusion and their Impact on Machine Translation
Description
Theoretical work in morphological typology offers the possibility of measuring morpholog- ical diversity on a continuous scale. How- ever, literature in Natural Language Process- ing (NLP) typically labels a whole language with a strict type of morphology, e.g. fu- sional or agglutinative. In this work, we pro- pose to reduce the rigidity of such claims, by quantifying morphological typology at the word and segment level. We consider Payne (2017)’s approach to classify morphology us- ing two indices: synthesis (e.g. analytic to polysynthetic) and fusion (agglutinative to fu- sional). For computing synthesis, we test un- supervised and supervised morphological seg- mentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study. Then, we analyse the relationship between machine translation quality and the degree of synthesis and fusion at word (nouns and verbs for English-Turkish, and verbs in English-Spanish) and segment level (previous language pairs plus English-German in both di- rections). We complement the word-level anal- ysis with human evaluation, and overall, we observe a consistent impact of both indexes on machine translation quality.
Files
_ACL__Quantifying_Morphological_Typology_for_NLP.pdf
Files
(364.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:74bb04a9ea2c7bf1f0c0c28fceba7700
|
364.5 kB | Preview Download |