What can Euclidean distance do for translation evaluations?
Description
We describe an empirical method to screen informational translation shifts in parallel segment pairs extracted from a bilingual or multilingual translation corpus using two linguistic features that are independent of the languages matched by the translation. The method applies to most known languages and in one or the other of the two translation directions (direct or inverse). The features measured for each segment in source and target languages are character count and lexical word count (or information volume). Information volume is compiled through an algorithm coded in Python using spaCy v2.1.3 core linguistic models. The values of source and target segment features and the translation precision ratio of each segment pairs are averaged over the text to which they belong and all segment values are standardized in relation to their textual average. The deviation between standardized values for each segment in a pair, as measured by the weighted Euclidean distance, allows for the screening and identification of target segments that are atypical or heteromorphic in comparison with their source segment. Our hypothesis is that those heteromorphic segment pairs, as opposed to isomorphic ones, are more likely to contain informational translation shifts. The objective and reproducible method described herein allows for semi-automatic identification of problematic translations and uncovering of textual and linguistic facts revealing translation processes, contingencies, and determinism.
Files
296-Bisiada-2021-7.pdf
Files
(182.4 kB)
Name | Size | Download all |
---|---|---|
md5:5e039bf8fec9fa7f9cb046e802cec283
|
182.4 kB | Preview Download |
Additional details
Related works
- Is part of
- 978-3-96110-300-3 (ISBN)
- 10.5281/zenodo.4450014 (DOI)