What is the impact of synthetic data augmentation on low-resource machine translation quality
Description
One important issue that affects the performance of neural machine translation is the scale of available parallel data. For low-resource languages, the amount of parallel data is not sufficient, which results in poor translation quality. In this paper, we propose a diversity data augmentation method that does not use extra monolingual data. We expand the training data by generating diversity pseudo parallel data on the source and target sides. To generate diversity data, the restricted sampling strategy is employed at the decoding steps. Finally, we filter and merge origin data and synthetic p
Research goal: What is the impact of synthetic data augmentation on low-resource machine translation quality?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.5/10.
Notes
Files
paper.pdf
Files
(74.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:7fa87a0523c444ff4f346efa7353f5d1
|
74.3 kB | Preview Download |