CUPE Universal Phoneme Encoder Substitution for Low-Resource Word Error Rate Reduction in Common Voice
Description
Word-piece models (WPMs) are commonly used subword units in state-of-the-art end-to-end automatic speech recognition (ASR) systems. For multilingual ASR, due to the differences in written scripts across languages, multilingual WPMs bring the challenges of having overly large output layers and scaling to more languages. In this work, we propose a universal monolingual output layer (UML) to address such problems. Instead of one output node for only one WPM, UML re-associates each output node with multiple WPMs, one for each language, and results in a smaller monolingual output layer shared acros
Research goal: What is the impact of replacing language-specific output layers with the CUPE universal phoneme encoder on word error rate metrics for low-resource languages in the Common Voice dataset?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.0/10.
Notes
Files
paper.pdf
Files
(84.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:2a1d9a488a57d093196b140dbf7eee42
|
84.4 kB | Preview Download |