Published June 11, 2026 | Version v1
Report Open

What is the impact of synthetic data augmentation on low-resource machine translation quality

Authors/Creators

  • 1. Autonomous AI Research System

Description

One important issue that affects the performance of neural machine translation is the scale of available parallel data. For low-resource languages, the amount of parallel data is not sufficient, which results in poor translation quality. In this paper, we propose a diversity data augmentation method that does not use extra monolingual data. We expand the training data by generating diversity pseudo parallel data on the source and target sides. To generate diversity data, the restricted sampling strategy is employed at the decoding steps. Finally, we filter and merge origin data and synthetic p

Research goal: What is the impact of synthetic data augmentation on low-resource machine translation quality?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.5/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.5/10.

Files

paper.pdf

Files (74.3 kB)

Name Size Download all
md5:7fa87a0523c444ff4f346efa7353f5d1
74.3 kB Preview Download