Published March 13, 2024 | Version v1
Model Open

English Espnet speech recognition model trained on librispeech

Description

This is the baseline tranformer model for English Speech Recognition trained using ESPNET v1 on the LibriSpeech database. 

Performances are as below on the dev_clean subset:

write a CER (or TER) result in exp/train_960_lc.rm_pytorch_train_transformer_large_unigram5000_specaug/decode/test/exp/train_960_pytorch_train_transformer_large_specaug/results/model.val5.avg
.best/result.txt
|       SPKR               |       # Snt              # Wrd        |       Corr                 Sub                Del                 Ins                 Err               S.Err        |
|       Sum/Avg            |       2703               67572        |       96.2                 2.9                0.9                 0.6                 4.4                37.5        |
write a WER result in exp/train_960_lc.rm_pytorch_train_transformer_large_unigram5000_specaug/decode/test/exp/train_960_pytorch_train_transformer_large_specaug/results/model.val5.avg.best/res
ult.wrd.txt
|       SPKR               |        # Snt              # Wrd        |       Corr                  Sub                 Del                 Ins                  Err               S.Err        |
|       Sum/Avg            |        2703               54402        |       96.8                  2.9                 0.3                 0.5                  3.6                37.4        |

Files

baseline_librispeech.large.espnet1.zip

Files (361.3 MB)

Name Size Download all
md5:55297b6c7bb22778b2cd809d53613575
361.3 MB Preview Download

Additional details

Funding

European Commission
FVLLMONTI - Ferroelectric Vertical Low energy Low latency low volume Modules fOr Neural network Transformers In 3D 101016776

References

  • L. Ben Letaifa and J.-L. Rouas, 'Transformer Model Compression for End-to-End Speech Recognition on Mobile Devices', in EUSIPCO 2022, 2022.