Published April 26, 2022 | Version v1.0
Project deliverable Open

D4.6 Guidelines for further use of MT systems in social surveys

  • 1. CLARIN/CUNI
  • 2. WageIndicator
  • 3. UPF

Description

This report describes guidelines that can be applied for training specialized neural machine translation (NMT) systems aimed at translation in a narrow textual domain, namely the domain of social surveys, requiring a specialized MT model that is able to handle domain-specific terminology. The work presented in this report demonstrates how relatively low-resource in-domain corpora can be used to prepare these specialized models. All described models are compatible with the packaged MT framework described in Deliverable D4.5 and the best performing models are available at the Lindat repository. The code used in the training pipeline (for experiment reproduction) is available on GitHub distributed under Mozilla Public License 2.0.1

Partners also describe the full translation pipeline including file sharing and preprocessing that was used to help with automatic translation of the Covid-19 surveys into English. While the description of the pipeline is general enough to be used in other future projects, the code published by partners on GitHub serves only as an example of a task-specific solution.

Notes

Approved by EC - 27 April 2022

Files

D4.6 Guidelines for further use of MT systems in social surveys (Approved 27 Apr 2022).pdf

Additional details

Funding

SSHOC – Social Sciences & Humanities Open Cloud 823782
European Commission