doclevel-MT-benchmark-discoMT2019

Tiedemann, Jörg; Scherrer, Yves

doi:10.5281/zenodo.3525366

Published November 1, 2019 | Version v1

Dataset Open

doclevel-MT-benchmark-discoMT2019

1. University of Helsinki

This release contains data sets for experiments with document-level machine translation. The data sets have been used in previous studies and provided here for replicability and comparison with other systems. The data sets are taken from the English-German news translation task at WMT 2019 and the English-German bitext in the OpenSubtitles collection v2016 from OPUS. All data sets are sentence aligned with corresponding lines being aligned to each other. Document boundaries are marked with empty lines (on both sides of the parallel corpus).

The data set has been used in the following publication:

@inproceedings{scherrer-tiedemann-loaiciga-2019,
    title = "Analysing concatenation approaches to document-level NMT in two different domains",
    author = {Scherrer, Yves and Tiedemann, J{\"o}rg and Lo{\'a}iciga, Sharid},
    booktitle = "Proceedings of the Third Workshop on Discourse in Machine Translation",
    month = nov,
    year = "2019",
    address = "Hong-Kong",
    publisher = "Association for Computational Linguistics",
}

Please, cite that paper if you use the data set in your own work.

Files

doclevel-MT-benchmark-discomt2019.zip

Files (1.8 GB)

Name	Size	Download all
doclevel-MT-benchmark-discomt2019.zip md5:be5c854fe6db401b2d72fd1b8ab984ae	1.8 GB	Preview Download

Additional details

European Commission
FoTran - Found in Translation – Natural Language Understanding with Cross-Lingual Grounding 771113
European Commission
MeMAD - Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy 780069

Scherrer, Tiedemann and Loáiciga: "Analysing concatenation approaches to document-level NMT in two different domains", in Proceedings of DiscoMT2019 at EMNLP 2019, Hong-Kong

	All versions	This version
Views	667	667
Downloads	77	77
Data volume	177.6 GB	177.6 GB

doclevel-MT-benchmark-discomt2019.zip

Files (1.8 GB)

Funding

References

doclevel-MT-benchmark-discoMT2019

Authors/Creators

Description

Files

doclevel-MT-benchmark-discomt2019.zip

Files (1.8 GB)

Additional details

Funding

References