Practical Perspectives on Black-Box Critical Error Detection for Machine Translation
Authors/Creators
Description
The performance of machine translation systems has improved significantly in recent years, but they are not immune to errors. This motivates the requirement to identify translations in need of human review and post-edits. Most current research focuses on quality estimation, that is predicting translation quality on a continuous numerical scale, which can be difficult to interpret in applied contexts. In contrast, this project focused on predicting binary labels indicating whether translations contain errors that alter the meaning of the original text, also referred to as critical errors. Such errors are likely of high priority for human review and post-edit in vast majority of use cases.
Files
knight-et-al-2025.pdf
Files
(1.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:ece5f129f24f7ecac584146f94ceaa6d
|
1.3 MB | Preview Download |
Additional details
Biodiversity
- Catalog number
- Turing Technical Report No. 4
References
- 1] Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025. [2] Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021. URL https://arxiv.org/abs/2106.09685. [3] Niklas Muennighoz, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, and Tatsunori Hashimoto. s1: Simple test-time scaling, 2025. URL https://arxiv.org/abs/ 2501.19393.