Convolutional Neural Networks for Speech Recognition

Abdel-Hamid, Ossama; Mohamed, Abdel-Rahman; Jiang, Hui; Deng, Li; Penn, Gerald; Yu, Dong

doi:10.1109/taslp.2014.2339736

Published October 1, 2014 | Version v1

Journal article Open

Convolutional Neural Networks for Speech Recognition

Recently, the hybrid deep neural network (DNN)-hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over the conventional Gaussian mixture model (GMM)-HMM. The perfor- mance improvement is partially attributed to the ability of the DNN to model complex correlations in speech features. In this paper we show that further error rate reduction can be obtained by using convolutional neural networks (CNNs). We first present a concise description of the basic CNN and explain how it can be used for speech recognition. We further propose a limited-weight-sharing scheme that can better model speech features. The special structure such as local connectivity, weight sharing, and pooling in CNNs exhibits some degree of invariance to small shifts of speech features along the frequency axis, which is important to deal with speaker and environment variations. Experimental results show that CNNs reduce the error rate by 6-10% compared with DNNs on the TIMIT phone recognition and the voice search large vocabulary speech recognition tasks.

Files

article.pdf

Files (2.0 MB)

Name	Size	Download all
article.pdf md5:29eb9b32de195142ad9b5c7c32a07a49	2.0 MB	Preview Download

	All versions	This version
Views	2,316	2,304
Downloads	2,871	2,853
Data volume	6.1 GB	6.1 GB

Convolutional Neural Networks for Speech Recognition

Creators

Description

Files

article.pdf

Files (2.0 MB)