A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect
Authors/Creators
- 1. EuroMov Digital Health in Motion, Univ Montpellier, IMT Mines Ales
- 2. DiappyMed
Description
This study presents a large scale benchmarking on cloud based Speech-To-Text systems: {Google Cloud Speech-To-Text}, {Microsoft Azure Cognitive Services}, {Amazon Transcribe}, {IBM Watson Speech to Text}. For each systems, 40158 clean and noisy speech files about 101 hours are tested. Effect of background noise on STT quality is also evaluated with 5 different Signal-to-noise ratios from 40dB to 0dB. Results showed that {Microsoft Azure} provided lowest transcription error rate 9.09% on clean speech, with high robustness to noisy environment. {Google Cloud} and {Amazon Transcribe} gave similar performance, but the latter is very limited for time-constraint usage. Though {IBM Watson} could work correctly in quiet conditions, it is highly sensible to noisy speech which could strongly limit its application in real life situations.
Files
article_stt_apia2021.pdf
Files
(954.5 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:86f05718b3c33c1ae618a2c004ca0dc8
|
954.5 kB | Preview Download |
Additional details
Related works
- Is identical to
- arXiv:2105.03409 (arXiv)