Published September 11, 2025 | Version 1.1
Dataset Open

ParaFarm: English-Ukrainian Multiple-Translation Corpus

  • 1. ROR icon National Technical University "Kharkiv Polytechnic Institute"
  • 2. Friedrich-Schiller-Universität Jena

Description

Annotation

ParaFarm: English-Ukrainian Multiple-Translation Corpus is a parallel corpus designed to facilitate the study of translation variation and linguistic diversity in Ukrainian. The corpus comprises 1,390 English segments extracted from George Orwell’s Animal Farm, aligned with their corresponding translations from seven published Ukrainian editions of the novel. This resource enables researchers to explore multiple translation choices for identical source material, offering valuable insights into Ukrainian language variability and translator decision-making. The corpus is distributed in TMX format.

Applications

Translation Studies: comparative analysis of translation strategies and decision-making processes
Ukrainian Language Variation: investigation of lexical and grammatical diversity in Ukrainian
Corpus Linguistics: quantitative analysis of translation patterns and linguistic phenomena
Machine Translation Evaluation: reference corpus for assessing MT system output quality
Paraphrase Generation: training data for neural paraphrase generation models

Ethical Considerations

This corpus was created exclusively for academic research purposes under the principles of fair use in scholarly analysis. The source material and translations are used in a transformative manner for linguistic research, with proper attribution to the original translators.

Citation

When using this corpus in research, please cite as:

Viktoriia Kalashnyk, Maria Shvedova. (2025). ParaFarm: English-Ukrainian Multiple-Translation Corpus. Zenodo. https://doi.org/10.5281/zenodo.17093177

Files

Files (3.3 MB)

Name Size Download all
md5:69cca47808809e52202610b00748fb96
124.0 kB Download
md5:b4ca27ef692ee64d3781a3ac94eb33b3
3.2 MB Download