Slovenian Text-to-Speech Synthesis for Speech User Interfaces
Description
The paper presents the design concept of a unitselection text-to-speech synthesis system for the Slovenian language. Due to its modular and upgradable architecture, the system can be used in a variety of speech user interface applications, ranging from server carrier-grade voice portal applications, desktop user interfaces to specialized embedded devices. Since memory and processing power requirements are important factors for a possible implementation in embedded devices, lexica and speech corpora need to be reduced. We describe a simple and efficient implementation of a greedy subset selection algorithm that extracts a compact subset of high coverage text sentences. The experiment on a reference text corpus showed that the subset selection algorithm produced a compact sentence subset with a small redundancy. The adequacy of the spoken output was evaluated by several subjective tests as they are recommended by the International Telecommunication Union ITU.
Files
1755.pdf
Files
(9.0 MB)
Name | Size | Download all |
---|---|---|
md5:343ce177902fb82ee1c92a78787f959a
|
9.0 MB | Preview Download |
Additional details
References
- A.W. Black and K.A. Lenzo, "Flite: a small fast run-time speech synthesis engine," In Proceedings of the 4th ISCA Workshop on Speech Synthesis, 2001, pp. 204-207.
- M.L. Tomokoyo, W.A. Black and K.A. Lenzo, "Arabic in my hand: small footprint synthesis of Egyptian Arabic," In Proceedings of the Eurospeech-03, Geneva, Switzerland, 2003, pp. 2049-2052.
- T. Šef and M. Gams, "Speaker (GOVOREC): a complete Slovenian textto speech system," International journal on speech technologies, vol.6, 2003, pp. 277-287.
- N. Pave┼íić, J. Gros, S. Dobri┼íek and F. Miheli─ì, "Homer II - man - machine interface to internet for blind and visually impaired people,". Computer communications, 2003, vol. 26, pp. 438-443.
- B. Vesnicer and F. Miheli─ì, "Evaluation of the Slovenian HMM-based speech synthesis system," Proc. TSD'04, Lecture notes in computer science, vol. 1692, Berlin, Springer Verlag, 2004, pp. 513-520.
- J. Gros, F. Miheli─ì, N. Pave┼íić, M. Žganec, A. Miheli─ì, M. Knez, A. Mer─ìun and D. ┼ákerl, "The phonectic SMS reader," Proc. TSD'01, Lecture notes in computer science, vol. 1692, Springer Verlag, Berlin, 2001, pp. 334-340.
- N. Campbell, "CHATR: a high-definition speech resequencing system," In Proceedings of the 3rd ASA/ASJ Joint Meeting, 1996, pp. 1223-1228.
- M. Beutnagel, A. Conkie, J. Schroeter and Y. Stylianou, "The AT&T Next-Gen TTS System," in Proceedings of the 137th Meeting of the Acoustic Society of America, 2000.
- B. Möbius, "The Bell Labs German text-to-speech system," Computer Speech and Language, vol. 13, 1999, pp. 319-358. [10] J. Meron and P. Veprek, "Compression of exception lexicons for small footprint grapheme-to-phoneme conversion," In Proceedings of the ICASSP-05, Philadelphia, USA, March 18-23, 2005. [11] J. Gros, N. Pave┼íić and F. Miheli─ì, "Syllable and segment duration at different speaking rates for the Slovenian language," in Proceedings of the Eurospeech-97, Rhodes, Greece, 1997, pp. 1-4. [12] J. Gros, N. Pave┼íić and F. Miheli─ì, "Speech timing in Slovenian TTS", in Proceedings of the Eurospeech-97, Rhodes, Greece, 1997, pp. 323- 326. [13] A. Conkie, "Robust unit selection system for speech synthesis," in Proceedings of the Eurospeech'99, Budapest, Hungary, 1999. [14] M. Beutnagel, R. Mohri and M. Riley, "Rapid unit selection from a large speech corpus for concatenative speech synthesis," in Proceedings of the Eurospeech '99, Budapest, Hungary, 1999. [15] J. Tian, J. Nurminen and I. Kiss, "Optimal subset selection from text databases," In Proceedings of the ICASSP-05, Philadelphia, USA, March 18-23, 2005. [16] J.P.H. Van Santen, "Methods for optimal text selection," In Proceedings of the Eurospeech-97, Rhodes, Greece, 1997, pp. 553-556. [17] H. Kawai, S. Yamamoto and T. Shimizu, "A design method of speech corpus for text-to-speech synthesis taking into account prosody," in Proceedings of the ICSLP-00, 2000, pp. 420-425. [18] C. Kuo and J. Huang, "Efficient and scalable methods for text script generation in corpus-based TTS design," in Proceedings of the ICSLP-02, 2002, pp. 121-124. [19] B. Bozkurt, O. Ozturk and T. Dutoit, "Text design for TTS speech corpus building using a modified greedy selection," in Proceedings of the Eurospeech-03, Geneva, Switzerland, 2003, pp. 277-180. [20] M. Isogai, M. Mizuno and K. Mano, "Recording script design for corpus-based TTS system based on coverage of various phonetic elements," In Proceedings of the ICASSP-05, Philadelphia, USA, March 18-23, 2005. [21] F. Malfrère and T. Dutoit, "High quality speech synthesis for phonetic speech segmentation," In Proceedings of the Eurospeech-97, Rhodes, Greece, 1997, pp. 2631-2634. [22] F. Miheli─ì, J. Gros, S. Dobri┼íek, J. Žibert and N. Pave┼íić, "Spoken language resources at LUKS of the University of Ljubljana," International Journal on Speech Technologies, vol. 6, no. 3, 2003, pp. 221-232. [23] G. Xydas and G. Kouroupetroglou, "An intonation model for embedded devices based on natural F0 samples," In Proceedings of the Interspeech-04, Korea, 2004, pp. 801-804. [24] ITU, "A method for subjective performance assessment of the quality of speech voice output devices," ITU-T Recommendation P.85, ITU, 1994. [25] ITU, "Telephone transmission quality subjective opinion tests - Modulated noise reference unit," ITU-T Recommendation P.81, ITU, Blue Book, (5), pp. 1-5, 1993. [26] J. Gros, F. Miheli─ì and N. Pave┼íić, "Slovene interactive text-to-speech evaluation site - SITES," Proc. TSD'99, Lecture notes in computer science, vol. 1692, Berlin, Springer Verlag, 1999, pp. 223-228.