English Espnet speech recognition model trained on librispeech
Authors/Creators
Contributors
Researchers:
Description
This is the baseline tranformer model for English Speech Recognition trained using ESPNET v1 on the LibriSpeech database.
Performances are as below on the dev_clean subset:
write a CER (or TER) result in exp/train_960_lc.rm_pytorch_train_transformer_large_unigram5000_specaug/decode/test/exp/train_960_pytorch_train_transformer_large_specaug/results/model.val5.avg
.best/result.txt
| SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
| Sum/Avg | 2703 67572 | 96.2 2.9 0.9 0.6 4.4 37.5 |
write a WER result in exp/train_960_lc.rm_pytorch_train_transformer_large_unigram5000_specaug/decode/test/exp/train_960_pytorch_train_transformer_large_specaug/results/model.val5.avg.best/res
ult.wrd.txt
| SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
| Sum/Avg | 2703 54402 | 96.8 2.9 0.3 0.5 3.6 37.4 |
Files
baseline_librispeech.large.espnet1.zip
Files
(361.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:55297b6c7bb22778b2cd809d53613575
|
361.3 MB | Preview Download |
Additional details
Funding
References
- L. Ben Letaifa and J.-L. Rouas, 'Transformer Model Compression for End-to-End Speech Recognition on Mobile Devices', in EUSIPCO 2022, 2022.