Published December 22, 2023 | Version v1
Journal article Open

Investigations of the Distributions of Phonemic Durations in Hindi and Dogri

Description

Speech generation is one of the most important areas of research in speech signal processing which is now gaining a serious attention. Speech is a natural form of communication in all living things. Computers with the ability to understand speech and speak with a human like voice are expected to contribute to the development of more natural man-machine interface. However, in order to give those functions that are even closer to those of human beings, we must learn more about the mechanisms by which speech is produced and perceived, and develop speech information processing technologies that can generate a more natural sounding systems. The so described field of stud, also called speech synthesis and more prominently acknowledged as text-to-speech synthesis, originated in the mid eighties because of the emergence of DSP and the rapid advancement of VLSI techniques. To understand this field of speech, it is necessary to understand the basic theory of speech production. Every language has different phonetic alphabets and a different set of possible phonemes and their combinations. For the analysis of the speech signal, we have carried out the recording of five speakers in Dogri (3 male and 5 females) and eight speakers in Hindi language (4 male and 4 female). For estimating the durational distributions, the mean of mean of ten instances of vowels of each speaker in both the languages has been calculated. Investigations have shown that the two durational distributions differ significantly with respect to mean and standard deviation. The duration of phoneme is speaker dependent. The whole investigation can be concluded with the end result that almost all the Dogri phonemes have shorter duration, in comparison to Hindi phonemes. The period in milli seconds of same phonemes when uttered in Hindi were found to be longer compared to when they were spoken by a person with Dogri as his mother tongue. There are many applications which are directly of indirectly related to the research being carried out. For instance the main application may be for transforming Dogri speech into Hindi and vice versa, and further utilizing this application, we can develop a speech aid to teach Dogri to children. The results may also be useful for synthesizing the phonemes of Dogri using the parameters of the phonemes of Hindi and for building large vocabulary speech recognition systems.

Files

2113ijnlc03.pdf

Files (207.6 kB)

Name Size Download all
md5:a90675ea9422f249b6c51caed3976df6
207.6 kB Preview Download

Additional details

Dates

Copyrighted
2013

References

  • [1] Yin Hui, Hohmann Volker, & Nadeu Climent (2011) "Acoustic feature for speech recognition based on Gammatone filterbank and instantaneous frequency", ELSEV1IER Speech Communication 53, pp 707-715. [2] Stylianou Yannis, Cappe Oliver, & Moulines Eric (1998) "Continuous probabilistic transform for voice conversion", IEEE Trans. Speech and Audio Processing, Vol. 6, No. 2, pp 131-142. [3] Chaudhari U V, Navaratil J & Maes S H (2003) "Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition", IEEE Trans. on Speech and Audio Processing, Vol. 11, No. 1, pp 61-69. [4] Alani Ahmed, & Deriche Mohamed (1999) "A novel approach to speech segmentation using the wavelet transform", Fifth International Symposiumon Signal Processing and its Applications (ISPRA), pp 127-130. [5] Moataz El Ayadi, Mohamed S. Kamel, & Fakhri Karray (2011) "Survey on speech emotion recognition: Features, classification schemes, and database", ELSEVIER Pattern Recognition, pp 572-587. [6] Park J, Delhi F, Gales M J F, Tomalin M & Woodland P C (2011) "The efficient incorporation of MLP features into automatic speech recognition system", ELSEVIER, Computer Speech and Languages, pp 519-534. [7] Lu X, Unoki M & Nakamura S (2011) "Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments", ELSVIER Computer Speech and Language, pp 571-584.