Published January 2025 | Version v1
Technical note Open

Practical Perspectives on Black-Box Critical Error Detection for Machine Translation

Description

The performance of machine translation systems has improved significantly in recent years, but they are not immune to errors. This motivates the requirement to identify translations in need of human review and post-edits. Most current research focuses on quality estimation, that is predicting translation quality on a continuous numerical scale, which can be difficult to interpret in applied contexts. In contrast, this project focused on predicting binary labels indicating whether translations contain errors that alter the meaning of the original text, also referred to as critical errors. Such errors are likely of high priority for human review and post-edit in vast majority of use cases.

 

Files

knight-et-al-2025.pdf

Files (1.3 MB)

Name Size Download all
md5:ece5f129f24f7ecac584146f94ceaa6d
1.3 MB Preview Download

Additional details

Biodiversity

References

  • 1] Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025. [2] Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021. URL https://arxiv.org/abs/2106.09685. [3] Niklas Muennighoz, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel Candès, and Tatsunori Hashimoto. s1: Simple test-time scaling, 2025. URL https://arxiv.org/abs/ 2501.19393.