A Learning Algorithm for Risk-Sensitive Cost
Arnab Basu,
Tirthankar Bhattacharyya,
Vivek S. Borkar
Quantitative Methods and Information Systems Area, Indian Institute of Management Bangalore, Bangalore 560076, India
Department of Mathematics, Indian Institute of Science, Bangalore 560012, India
School of Technology and Computer Science, Tata Institute of Fundamental Research, Mumbai 400005, India
arnab.basu{at}iimb.ernet.in
tirtha{at}math.iisc.ernet.in, http://math.iisc.ernet.in/
tirtha
borkar{at}tifr.res.in, http://www.tcs.tifr.res.in/
borkar
A linear function approximation-based reinforcement learning algorithm is proposed for Markov decision processes with infinite horizon risk-sensitive cost. Its convergence is proved using the "o.d.e. method" for stochastic approximation. The scheme is also extended to continuous state space processes.
Key Words: learning algorithm; risk-sensitive cost; function approximation; stochastic approximation
History: Received: December 29, 2006;
revision received: November 11, 2007;
Copyright © 2008 by INFORMS.