What is the impact of synthetic data augmentation on low-resource machine translation quality

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20634467

Published June 11, 2026 | Version v1

Report Open

What is the impact of synthetic data augmentation on low-resource machine translation quality

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

One important issue that affects the performance of neural machine translation is the scale of available parallel data. For low-resource languages, the amount of parallel data is not sufficient, which results in poor translation quality. In this paper, we propose a diversity data augmentation method that does not use extra monolingual data. We expand the training data by generating diversity pseudo parallel data on the source and target sides. To generate diversity data, the restricted sampling strategy is employed at the decoding steps. Finally, we filter and merge origin data and synthetic p

Research goal: What is the impact of synthetic data augmentation on low-resource machine translation quality?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.5/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 7.5/10.

Files

paper.pdf

Files (74.3 kB)

Name	Size	Download all
paper.pdf md5:7fa87a0523c444ff4f346efa7353f5d1	74.3 kB	Preview Download

	All versions	This version
Views	2	2
Downloads	2	2
Data volume	148.7 kB	148.7 kB

What is the impact of synthetic data augmentation on low-resource machine translation quality

Authors/Creators

Description

Notes

Files

paper.pdf

Files (74.3 kB)