ParaFarm: English-Ukrainian Multiple-Translation Corpus
Description
Annotation
ParaFarm: English-Ukrainian Multiple-Translation Corpus is a parallel corpus designed to facilitate the study of translation variation and linguistic diversity in Ukrainian. The corpus comprises 1,390 English segments extracted from George Orwell’s Animal Farm, aligned with their corresponding translations from seven published Ukrainian editions of the novel. This resource enables researchers to explore multiple translation choices for identical source material, offering valuable insights into Ukrainian language variability and translator decision-making. The corpus is distributed in TMX format.
Applications
Translation Studies: comparative analysis of translation strategies and decision-making processes
Ukrainian Language Variation: investigation of lexical and grammatical diversity in Ukrainian
Corpus Linguistics: quantitative analysis of translation patterns and linguistic phenomena
Machine Translation Evaluation: reference corpus for assessing MT system output quality
Paraphrase Generation: training data for neural paraphrase generation models
Ethical Considerations
This corpus was created exclusively for academic research purposes under the principles of fair use in scholarly analysis. The source material and translations are used in a transformative manner for linguistic research, with proper attribution to the original translators.
Citation
When using this corpus in research, please cite as:
Viktoriia Kalashnyk, Maria Shvedova. (2025). ParaFarm: English-Ukrainian Multiple-Translation Corpus. Zenodo. https://doi.org/10.5281/zenodo.17093177
Files
Files
(3.3 MB)
Name | Size | Download all |
---|---|---|
md5:69cca47808809e52202610b00748fb96
|
124.0 kB | Download |
md5:b4ca27ef692ee64d3781a3ac94eb33b3
|
3.2 MB | Download |