There is a newer version of the record available.

Published March 22, 2023 | Version v1
Preprint Open

ASR pipeline for low-resourced languages: A case study on Pomak

Description

Automatic Speech Recognition (ASR) models can aid field linguists by facilitating the creation of text corpora from oral material. Training ASR systems for low-resource languages can be a challenging task not only due to lack of resources but also due to the work required for the preparation of a training dataset. We present a pipeline for data processing and ASR model training for low-resourced languages, based on the language family. As a case study, we collected recordings of Pomak, an endangered South East Slavic language variety spoken in Greece. Using the proposed pipeline, we trained the first Pomak ASR model.

Files

asr_pipeline_Pomak_case_study_field_matters.pdf

Files (294.3 kB)

Name Size Download all
md5:c52e47c19a62b7f07f89c7cbd8374b3b
294.3 kB Preview Download