ASR pipeline for low-resourced languages: A case study on Pomak
Description
Automatic Speech Recognition (ASR) models can aid field linguists by facilitating the creation of text corpora from oral material. Training ASR systems for low-resource languages can be a challenging task not only due to lack of resources but also due to the work required for the preparation of a training dataset. We present a pipeline for data processing and ASR model training for low-resourced languages, based on the language family. As a case study, we collected recordings of Pomak, an endangered South East Slavic language variety spoken in Greece. Using the proposed pipeline, we trained the first Pomak ASR model.
Files
asr_pipeline_Pomak_case_study_field_matters.pdf
Files
(294.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:c52e47c19a62b7f07f89c7cbd8374b3b
|
294.3 kB | Preview Download |