A Note on the Convergence of Policy Iteration in Markov Decision Processes with Compact Action Spaces
A. Y. Golubin
Department of Operations Research, Moscow Institute of Electronics and Mathematics, B. Trechsvjatitelsky per., 3/12, Moscow, 109028, Russia
io{at}miem.edu.ru
The undiscounted, unichain, finite state Markov decision process with compact action space is studied. We provide a counterexample for a result in Hordijk and Puterman (1987) and give an alternate proof of the convergence of policy iteration under the condition that there exists a state that is recurrent under every stationary policy. The analysis essentially uses a two-term matrix representation for the relative value vectors generated by policy iteration procedure.
Key Words: Markov decision processes; optimality equation; average reward; policy iteration
History: Received: February 11, 2000;
revision received: March 10, 2001;revision received: January 25, 2002;revision received: June 24, 2002;
Copyright © 2003 by INFORMS.