There is a newer version of the record available.

Published March 22, 2023 | Version v1
Preprint Open

ASR pipeline for low-resourced languages: A case study on Pomak


Automatic Speech Recognition (ASR) models can aid field linguists by facilitating the creation of text corpora from oral material. Training ASR systems for low-resource languages can be a challenging task not only due to lack of resources but also due to the work required for the preparation of a training dataset. We present a pipeline for data processing and ASR model training for low-resourced languages, based on the language family. As a case study, we collected recordings of Pomak, an endangered South East Slavic language variety spoken in Greece. Using the proposed pipeline, we trained the first Pomak ASR model.



Files (294.3 kB)

Name Size Download all
294.3 kB Preview Download