This dataset supplements research paper Speech Recognition for Endangered and Extinct Samoyedic languages by Niko Partanen, Mika Hämäläinen and Tiina Klooster. In this study a serie of Persephone models were trained for Nganasan and Kamas languages. Preprocessing scripts, training data and resulting ASR models are all published in Zenodo under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License. The license follows that of the original dataset.
In this study we have used INEL Kamas Corpus 1.0 and Nganasan Spoken Language Corpus 0.2. Both corpora are released under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License.
Our paper is to be cited followingly:
Partanen, Niko; Hämäläinen, Mika; Klooster, Tiina 2020. Speech Recognition for Endangered and Extinct Samoyedic languages. Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation.
These corpora are to be cited followingly:
Gusev, Valentin; Klooster, Tiina; Wagner-Nagy, Beáta. 2019. "INEL Kamas Corpus." Version 1.0. Publication date 2019-12-15. http://hdl.handle.net/11022/0000-0007-DA6E-9. Archived in Hamburger Zentrum für Sprachkorpora. In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). The INEL corpora of indigenous Northern Eurasian languages.
Brykina, Maria - Valentin Gusev - Sándor Szeverényi - Beáta Wagner-Nagy 2018: “Nganasan Spoken Language Corpus (NSLC).” Archived in Hamburger Zentrum für Sprachkorpora. Version 0.2. Publication date 2018-06-12. http://hdl.handle.net/11022/0000-0007-C6F2-8.
We recommend to use the models either through Christopher Cox's [Persephone-ELAN] extension, or through Persephone itself. The experiment numbers in this repository are matched with those in our paper by providing the experiment number of the paper in parenthesis. When loading the model, the data directory and the model number have to correspond.
For the model accuracies and exact descriptions, please refer to the publication.