Dataset Open Access

Neural models for morphological generation, analysis and lemmatization in 22 languages

Hämäläinen, Mika; Partanen, Niko; Rueter, Jack; Alnajjar, Khalid

Morphological models for generation, lemmatization and analysis in 22 languages. The models are trained in OpenNMT-py https://github.com/OpenNMT/OpenNMT-py. Feed one word at a time, split into characters (kissa -> k i s s a)

Supported languages: German (deu), Kven (fkv), Komi-Zyrian (kpv), Mokhsa (mdf), Mansi (mns), Erzya (myv), Norwegian Bokmål (nob), Russian (rus), South Sami (sma), Lule Sami (smj), Skolt Sami (sms), Võro (vro), Finnish (fin), Komi-Permyak (koi), Latvian (lav), Eastern Mari (mhr), Western Mari (mrj), Namonuito (nmt), Olonets-Karelian (olo), Pite Sami (sje), Northern Sami (sme), Inari Sami (smn) and Udmurt (udm)

Cite:

Hämäläinen, M., Partanen, N., Rueter, J., & Alnajjar, K. (2021). Neural Morphology Dataset and Models for Multiple Languages, from the Large to the Endangered. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa 2021)

Files (47.6 GB)
Name Size
deu.zip
md5:0114cc3a6cd69647ee544c412003989a
2.0 GB Download
fin.zip
md5:a6f6605c42855be9ef84e45d35d2bc47
3.9 GB Download
fkv.zip
md5:3a856cbe5f0842627222d4ac430e0c32
2.0 GB Download
koi.zip
md5:232d391b1f1086b847c07788f2b3f756
2.1 GB Download
kpv.zip
md5:e72b3a7ef10824890b217f78c22470cf
2.3 GB Download
lav.zip
md5:4e027e3f2ede42527153c01b85191697
2.0 GB Download
mdf.zip
md5:45726260d34ce9486bbb72d72fd18107
2.4 GB Download
mhr.zip
md5:52a04f40ef146e24c9f671db99fbd7d8
2.2 GB Download
mns.zip
md5:9d32cafa8c6ac2bc752ee682930d8b53
2.0 GB Download
mrj.zip
md5:ec83397fa63b14f54a67993f6ac5b10d
2.0 GB Download
myv.zip
md5:841b4a8a2daf0e57bc1a624f77dd3e89
2.0 GB Download
nob.zip
md5:8be159dca0b1f0c990eab9b5ca0052b1
2.0 GB Download
olo.zip
md5:0c32432cde219350fb3f8861fc7bce3c
2.1 GB Download
rus.zip
md5:764076cafd78aa16981fc33b0400d050
2.0 GB Download
sje.zip
md5:9acfd9a42576c20c0afc1ee96c5b8092
2.0 GB Download
sma.zip
md5:9fa0233618d3bd27aadb0bd88dd7b21c
2.0 GB Download
sme.zip
md5:7464f055a3d2a4763cc763d40be6c13b
2.1 GB Download
smj.zip
md5:b5c26d520a0f930596f369613bd4dea2
2.0 GB Download
smn.zip
md5:8f72eeff3afd6836fedabc0e08449fdc
2.1 GB Download
sms.zip
md5:97a4db0d83d98d2c424169d6cdd7eb7b
2.3 GB Download
train_scripts.zip
md5:8f0db15754e43d8bfc11a29d4f57186b
40.1 kB Download
udm.zip
md5:9eba670c1b3813e02471929504779f8a
2.2 GB Download
vro.zip
md5:ed401d0852cf880a67ee4944b2474c59
2.0 GB Download
35
80
views
downloads
All versions This version
Views 3535
Downloads 8080
Data volume 172.0 GB172.0 GB
Unique views 3434
Unique downloads 2323

Share

Cite as