Published November 25, 2007 | Version 1755
Journal article Open

Slovenian Text-to-Speech Synthesis for Speech User Interfaces

Description

The paper presents the design concept of a unitselection text-to-speech synthesis system for the Slovenian language. Due to its modular and upgradable architecture, the system can be used in a variety of speech user interface applications, ranging from server carrier-grade voice portal applications, desktop user interfaces to specialized embedded devices. Since memory and processing power requirements are important factors for a possible implementation in embedded devices, lexica and speech corpora need to be reduced. We describe a simple and efficient implementation of a greedy subset selection algorithm that extracts a compact subset of high coverage text sentences. The experiment on a reference text corpus showed that the subset selection algorithm produced a compact sentence subset with a small redundancy. The adequacy of the spoken output was evaluated by several subjective tests as they are recommended by the International Telecommunication Union ITU.

Files

1755.pdf

Files (9.0 MB)

Name Size Download all
md5:343ce177902fb82ee1c92a78787f959a
9.0 MB Preview Download

Additional details

References

  • A.W. Black and K.A. Lenzo, "Flite: a small fast run-time speech synthesis engine," In Proceedings of the 4th ISCA Workshop on Speech Synthesis, 2001, pp. 204-207.
  • M.L. Tomokoyo, W.A. Black and K.A. Lenzo, "Arabic in my hand: small footprint synthesis of Egyptian Arabic," In Proceedings of the Eurospeech-03, Geneva, Switzerland, 2003, pp. 2049-2052.
  • T. ┼áef and M. Gams, "Speaker (GOVOREC): a complete Slovenian textto speech system," International journal on speech technologies, vol.6, 2003, pp. 277-287.
  • N. Pave┼íić, J. Gros, S. Dobri┼íek and F. Miheli─ì, "Homer II - man - machine interface to internet for blind and visually impaired people,". Computer communications, 2003, vol. 26, pp. 438-443.
  • B. Vesnicer and F. Miheli─ì, "Evaluation of the Slovenian HMM-based speech synthesis system," Proc. TSD'04, Lecture notes in computer science, vol. 1692, Berlin, Springer Verlag, 2004, pp. 513-520.
  • J. Gros, F. Miheli─ì, N. Pave┼íić, M. Žganec, A. Miheli─ì, M. Knez, A. Mer─ìun and D. ┼ákerl, "The phonectic SMS reader," Proc. TSD'01, Lecture notes in computer science, vol. 1692, Springer Verlag, Berlin, 2001, pp. 334-340.
  • N. Campbell, "CHATR: a high-definition speech resequencing system," In Proceedings of the 3rd ASA/ASJ Joint Meeting, 1996, pp. 1223-1228.
  • M. Beutnagel, A. Conkie, J. Schroeter and Y. Stylianou, "The AT&T Next-Gen TTS System," in Proceedings of the 137th Meeting of the Acoustic Society of America, 2000.
  • B. Möbius, "The Bell Labs German text-to-speech system," Computer Speech and Language, vol. 13, 1999, pp. 319-358. [10] J. Meron and P. Veprek, "Compression of exception lexicons for small footprint grapheme-to-phoneme conversion," In Proceedings of the ICASSP-05, Philadelphia, USA, March 18-23, 2005. [11] J. Gros, N. Pave┼íić and F. Miheli─ì, "Syllable and segment duration at different speaking rates for the Slovenian language," in Proceedings of the Eurospeech-97, Rhodes, Greece, 1997, pp. 1-4. [12] J. Gros, N. Pave┼íić and F. Miheli─ì, "Speech timing in Slovenian TTS", in Proceedings of the Eurospeech-97, Rhodes, Greece, 1997, pp. 323- 326. [13] A. Conkie, "Robust unit selection system for speech synthesis," in Proceedings of the Eurospeech'99, Budapest, Hungary, 1999. [14] M. Beutnagel, R. Mohri and M. Riley, "Rapid unit selection from a large speech corpus for concatenative speech synthesis," in Proceedings of the Eurospeech '99, Budapest, Hungary, 1999. [15] J. Tian, J. Nurminen and I. Kiss, "Optimal subset selection from text databases," In Proceedings of the ICASSP-05, Philadelphia, USA, March 18-23, 2005. [16] J.P.H. Van Santen, "Methods for optimal text selection," In Proceedings of the Eurospeech-97, Rhodes, Greece, 1997, pp. 553-556. [17] H. Kawai, S. Yamamoto and T. Shimizu, "A design method of speech corpus for text-to-speech synthesis taking into account prosody," in Proceedings of the ICSLP-00, 2000, pp. 420-425. [18] C. Kuo and J. Huang, "Efficient and scalable methods for text script generation in corpus-based TTS design," in Proceedings of the ICSLP-02, 2002, pp. 121-124. [19] B. Bozkurt, O. Ozturk and T. Dutoit, "Text design for TTS speech corpus building using a modified greedy selection," in Proceedings of the Eurospeech-03, Geneva, Switzerland, 2003, pp. 277-180. [20] M. Isogai, M. Mizuno and K. Mano, "Recording script design for corpus-based TTS system based on coverage of various phonetic elements," In Proceedings of the ICASSP-05, Philadelphia, USA, March 18-23, 2005. [21] F. Malfrère and T. Dutoit, "High quality speech synthesis for phonetic speech segmentation," In Proceedings of the Eurospeech-97, Rhodes, Greece, 1997, pp. 2631-2634. [22] F. Miheli─ì, J. Gros, S. Dobri┼íek, J. Žibert and N. Pave┼íić, "Spoken language resources at LUKS of the University of Ljubljana," International Journal on Speech Technologies, vol. 6, no. 3, 2003, pp. 221-232. [23] G. Xydas and G. Kouroupetroglou, "An intonation model for embedded devices based on natural F0 samples," In Proceedings of the Interspeech-04, Korea, 2004, pp. 801-804. [24] ITU, "A method for subjective performance assessment of the quality of speech voice output devices," ITU-T Recommendation P.85, ITU, 1994. [25] ITU, "Telephone transmission quality subjective opinion tests - Modulated noise reference unit," ITU-T Recommendation P.81, ITU, Blue Book, (5), pp. 1-5, 1993. [26] J. Gros, F. Miheli─ì and N. Pave┼íić, "Slovene interactive text-to-speech evaluation site - SITES," Proc. TSD'99, Lecture notes in computer science, vol. 1692, Berlin, Springer Verlag, 1999, pp. 223-228.