Conference paper Open Access
Scarton, Scarton; Forcada, Mikel L.; Esplà-Gomis, Miquel; Specia, Lucia
{ "files": [ { "links": { "self": "https://zenodo.org/api/files/98b1bfb1-0fc0-487b-a99e-d46930a08a1d/IWSLT2019_paper_18.pdf" }, "checksum": "md5:92181f24bdccacf2a7f0f42c7b27fee9", "bucket": "98b1bfb1-0fc0-487b-a99e-d46930a08a1d", "key": "IWSLT2019_paper_18.pdf", "type": "pdf", "size": 428595 } ], "owners": [ 50447 ], "doi": "10.5281/zenodo.3525003", "stats": { "version_unique_downloads": 117.0, "unique_views": 145.0, "views": 165.0, "version_views": 165.0, "unique_downloads": 117.0, "version_unique_views": 145.0, "volume": 57003135.0, "version_downloads": 133.0, "downloads": 133.0, "version_volume": 57003135.0 }, "links": { "doi": "https://doi.org/10.5281/zenodo.3525003", "conceptdoi": "https://doi.org/10.5281/zenodo.3525002", "bucket": "https://zenodo.org/api/files/98b1bfb1-0fc0-487b-a99e-d46930a08a1d", "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.3525002.svg", "html": "https://zenodo.org/record/3525003", "latest_html": "https://zenodo.org/record/3525003", "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.3525003.svg", "latest": "https://zenodo.org/api/records/3525003" }, "conceptdoi": "10.5281/zenodo.3525002", "created": "2019-11-01T12:50:40.210079+00:00", "updated": "2020-01-20T17:34:40.544496+00:00", "conceptrecid": "3525002", "revision": 3, "id": 3525003, "metadata": { "access_right_category": "success", "doi": "10.5281/zenodo.3525003", "description": "<p>Devising metrics to assess translation quality has always been at the core of machine translation (MT) research. Traditional automatic reference-based metrics, such as BLEU, have shown correlations with human judgements of adequacy and fluency and have been paramount for the advancement of MT system development. Crowd-sourcing has popularised and enabled the scalability of metrics based on human judgments, such as subjective direct assessments (DA) of adequacy, that are believed to be more reliable than reference-based automatic metrics. Finally, task-based measurements, such as post-editing time, are expected to provide a more de- tailed evaluation of the usefulness of translations for a specific task. Therefore, while DA averages adequacy judgements to obtain an appraisal of (perceived) quality independently of the task, and reference-based automatic metrics try to objectively estimate quality also in a task-independent way, task-based metrics are measurements obtained either during or after performing a specific task. In this paper we argue that, although expensive, task-based measurements are the most reliable when estimating MT quality in a specific task; in our case, this task is post-editing. To that end, we report experiments on a dataset with newly-collected post-editing indicators and show their usefulness when estimating post-editing effort. Our results show that task-based metrics comparing machine-translated and post-edited versions are the best at tracking post-editing effort, as expected. These metrics are followed by DA, and then by metrics comparing the machine-translated version and independent references. We suggest that MT practitioners should be aware of these differences and acknowledge their implications when decid- ing how to evaluate MT for post-editing purposes.</p>", "language": "eng", "title": "Estimating post-editing effort: a study on human judgements, task-based and reference-based metrics of MT quality", "license": { "id": "CC-BY-4.0" }, "relations": { "version": [ { "count": 1, "index": 0, "parent": { "pid_type": "recid", "pid_value": "3525002" }, "is_last": true, "last_child": { "pid_type": "recid", "pid_value": "3525003" } } ] }, "communities": [ { "id": "iwslt2019" } ], "publication_date": "2019-11-02", "creators": [ { "affiliation": "Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK", "name": "Scarton, Scarton" }, { "affiliation": "Dept. Llenguatges i Sist. Inform., Universitat d'Alacant, 03690 St. Vicent del Raspeig, Spain", "name": "Forcada, Mikel L." }, { "affiliation": "Dept. Llenguatges i Sist. Inform., Universitat d'Alacant, 03690 St. Vicent del Raspeig, Spain", "name": "Espl\u00e0-Gomis, Miquel" }, { "affiliation": "Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK & Department of Computing, Imperial College London, London SW7 2AZ, UK", "name": "Specia, Lucia" } ], "access_right": "open", "resource_type": { "subtype": "conferencepaper", "type": "publication", "title": "Conference paper" }, "related_identifiers": [ { "scheme": "doi", "identifier": "10.5281/zenodo.3525002", "relation": "isVersionOf" } ] } }
All versions | This version | |
---|---|---|
Views | 165 | 165 |
Downloads | 133 | 133 |
Data volume | 57.0 MB | 57.0 MB |
Unique views | 145 | 145 |
Unique downloads | 117 | 117 |