4297448
doi
10.5281/zenodo.4297448
oai:zenodo.org:4297448
user-africanlp
Adebayo O. Adeojo
Babunde O. Popoola
Olumide Awokoya
Modupe Olaniyi
Princess Folasade
Tolulope Adelani
Oluyemisi Olaose
Jesujoba O. Alabi
Saarland University
Damilola Adebonojo
Adesina Ayeni
Mofe Adeyemi
Ayodele Awokoya
MENYO-20k: A Multi-domain English - Yorùbá Corpus for Machine Translation
David Ifeoluwa Adelani
Saarland University
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial 4.0 International
https://creativecommons.org/licenses/by-nc/4.0/legalcode
machine translation, yoruba, multi-domain
<p>MENYO-20k is a multi-domain parallel dataset with texts obtained from news articles, ted talks, movie transcripts, radio transcripts, science and technology texts, and other short articles curated from the web and professional translators. The dataset has 20,100 parallel sentences split into 10,070 training sentences, 3,397 development sentences, and 6,633 test sentences (3,419 multi-domain, 1,714 news domain, and 1,500 ted talks speech transcript domain)</p>
<p>The dataset is open but for non-commercial use because some of the data sources like <a href="https://www.ted.com/about/our-organization/our-policies-terms/ted-talks-usage-policy">Ted talks</a> and <a href="https://www.jw.org/en/terms-of-use/#link0">JW news</a> requires permission for commercial use.</p>
<p><strong>Acknowledgement</strong>: This project was supported by the <a href="https://www.k4all.org/project/language-dataset-fellowship/">AI4D language dataset fellowship</a> through K4All and Zindi Africa</p>
Zenodo
2020-11-30
info:eu-repo/semantics/other
4297447
user-africanlp
1.0
1606834132.955052
2490852
md5:06e1851230484547e03c8a1036d76bc7
https://zenodo.org/records/4297448/files/train.tsv
2491
md5:c845d69bec68ac19a2f57338a763bbbb
https://zenodo.org/records/4297448/files/readme.txt
public
10.5281/zenodo.4297447
isVersionOf
doi