Optimizing Dialogue Strategy Learning Using Learning Automata
Creators
Description
Modeling the behavior of the dialogue management in the design of a spoken dialogue system using statistical methodologies is currently a growing research area. This paper presents a work on developing an adaptive learning approach to optimize dialogue strategy. At the core of our system is a method formalizing dialogue management as a sequential decision making under uncertainty whose underlying probabilistic structure has a Markov Chain. Researchers have mostly focused on model-free algorithms for automating the design of dialogue management using machine learning techniques such as reinforcement learning. But in model-free algorithms there exist a dilemma in engaging the type of exploration versus exploitation. Hence we present a model-based online policy learning algorithm using interconnected learning automata for optimizing dialogue strategy. The proposed algorithm is capable of deriving an optimal policy that prescribes what action should be taken in various states of conversation so as to maximize the expected total reward to attain the goal and incorporates good exploration and exploitation in its updates to improve the naturalness of humancomputer interaction. We test the proposed approach using the most sophisticated evaluation framework PARADISE for accessing to the railway information system.
Files
12363.pdf
Files
(113.4 kB)
Name | Size | Download all |
---|---|---|
md5:9849e5d2d6baf43507646b98d055f0a7
|
113.4 kB | Preview Download |
Additional details
References
- O. Abul, F. Polat, and R. Alhajj, "Multiagent reinforcement learning using function approximation," IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 4, no. 4, pp. 485-497, Nov 2000.
- J. Allen, D. Byron, M. Dzikovska, G. Ferguson, L. Galescu, and A. Stent, "Towards conversational humancomputer interaction," AI Magazine, vol. 22, no. 4, pp. 27-38, 2001.
- L. Busoniu, B. D. Schutter, and R. Babuska, "Decentralized reinforcementlearning control of a robotic manipulator," in Proc. 9th Int. Conf. Control Autom. Robot. Vis. (ICARCV-06), Singapore, 2006, pp. 1347- 1352.
- H. Cuayahuitl, S. Renals, O. Lemon, and H. Shimodaira, "Reinforcement learning of dialogue strategies using hierarchical abstract machines," in Proc. of IEEE/ACL SLT, 2006.
- L. V. de Wege, "Learning automata as a framework for multi-agent reinforcement learning," Master-s thesis, Vrije Universiteit Brussel, Belgium, 2006.
- S. Dzeroski, L. D. Raedt, and K. Driessens, "Relational reinforcement learning," Machine Learning, vol. 43, no. 1-2, pp. 7-52, 2001.
- R. P. E. Levin and W. Eckert, "A stochastic model of humanmachine interaction for learning dialog strategies," IEEE Trans. Speech Audio Processing, vol. 8, no. 1, pp. 11-23, 2000.
- F. Fernandez and L. E. Parker, "Learning in large cooperative multirobot systems," International Journal of Autonomous Robots, vol. 16, no. 4, pp. 217-226, 2001.
- D. Goddeau, H. Meng, J. Polifroni, S. Seneff, and I. S. Busayapongcha, "A form-based dialogue manager for spoken language applications," in Proc. of ICSLP, Philadelphia, USA, 1996, pp. 701-704. [10] J. Henderson, O. Lemon, and K. Georgila, "Hybrid reinforcement/ supervised learning for dialogue policies from communicator data," in Workshop on Knowledge and Reasoning in Practical Dialogue Systems (IJCAI), 2005. [11] Y. Ishiwaka, T. Sato, and Y. Kakazu, "An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning," Robot. Autonomous System, vol. 43, no. 4, pp. 245-256, 2003. [12] M. Mctear, "Modelling spoken dialogues with state transition diagrams: Experiences with the cslu toolkit," in Proc. of ICSLP, Sidney, Australia, 1998, pp. 1223-1226. [13] K. S. Narendra and S. Lakshmivarrahan, "Learning automataÔÇöa critique," Journal of Cybernetics and Information Sciences, pp. 53-66, 1977. [14] K. S. Narendra and M. A. L. Thathachar, "Learning automata-a survey," IEEE Transaction on Systems, Man and Cybernetics-SMC, vol. 4, no. 8, pp. 323-334, 1974. [15] ÔÇöÔÇö, Learning Automata: An Introduction. Englewood Cliffs, NJ: Prentice-Hall, 1989. [16] A. Now'e, K. Verbeeck, and M. Peeters, "Learning automata as a basis for multi agent reinforcement learning," Learning and Adaption in Multi- Agent, pp. 71-85, 2006, ISSN 0302-9743. [17] B. J. Oommen and M. Agache, "Continuous and discretized pursuit learning schemes: Various algorithms and their comparison," IEEE Transactions on Systems, Man and Cybernetics, Part B:, vol. 32, pp. 77-287, 2002. [18] T. Peak and D. Chickering, "The markov assumption in spoken dialogue management," in 6th SIGDial Workshop on Discourse and Dialogue, 2006. [19] T. Peak and R. Pieraccini, "Automating spoken dialogue management design using machine learning: an industry perspective," Speech Communication, vol. 50, pp. 716-729, 2008. [20] O. Pietquin, A Framework for Unsupervised Learning of Dialogue Strategies. Preses Universitaries de Louvain, 2004. [21] O. Pietquin and T. Dutoit, "A probabilistic framework for dialog simulation and optimal strategy learning," IEEE Transactions on Audio, Speech and Language Processing, vol. 14, no. 2, pp. 589-599, 2006. [22] A. S. Poznyak and K. Najim, Learning Automata and Stochastic Optimization. Springer, 1997. [23] M. K. S. Singh, D. Litman and M. Walker, "Optimizing dialogue management with reinforcement learning: experiments with the njfun system," Journal of Artificial Intelligence Research, vol. 16, pp. 105- 133, 2002. [24] J. Schatzmann, K. Weilhammer, M. N. Stuttle, and S. Young, A Survey of Statistical User Simulation Techniques for Reinforcement-Learning of Dialogue Management Strategies. Cambridge University Press, 2006, vol. 21, pp. 97-126. [25] K. Scheffler and S. Young, "Corpus-based dialogue simulation for automatic strategy learning and evaluation," in Proc. of the NAACL Workshop on Adaptation in Dialogue Systems, 2001. [26] R. Sutton and A. Barto, Reinforcement learning: An introduction. MIT Press, 1998. [27] H. Tamakoshi and S. Ishii, "Multiagent reinforcement learning applied to a chase problem in a continuous world," Artif. Life Robot, vol. 5, no. 4, pp. 202-206, 2001. [28] M. A. L. Thathachar and P. S. Sastry, Networks of Learning Automata: Techniques for Online Stochastic Optimization. Norwell, MA: Kluwer, 2004. [29] K. Tuyls and A. Now'e, "Evolutionary game theory and multi-agent reinforcement learning," The Knowledge Engineering Review, vol. 20, pp. 63-90, 2005, ISSN 0269-8889. [30] K. Verbeeck, A. Now'e, P. Vrancx, and P. Maarten, Multi-Automat Learning, Reinforcement Learning: Theory and Applications. I-Tech Education and Publishing, 2008. [31] K. Verbeeck, P. Vrancx, and A. Now'e, "Networks of learning automata and limiting games," in Proc. of the 7th ALAMAS Symposium, 2007, pp. 171-182, ISSN 0922-8721. [32] M. Walker, D. Litman, C. Kamm, and A. Abella, "Paradise: A framework for evaluating spoken dialogue agents," in Proc. of the 5th annual meeting of the association for computational linguistics(ACL-97), 1997, pp. 271-280. [33] C. J. C. H. Watkins and P. Dayan, "Q-learning," Machine learning, vol. 8, no. 3, pp. 229-256, 1992. [34] R. M. Wheeler and K. S. Narendra, "Decentralized learning in finite markov chains," IEEE Transactions on Automatic Control, vol. AC-31, pp. 519-526, 1986. [35] M. Wiering, R. Salustowicz, and J. Schmidhuber, "Reinforcement learning soccer teams with incomplete world models," Autonomous. Robots, vol. 7, no. 1, pp. 77-88, 1999.