Mathematics of Operations Research
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


MATHEMATICS OF OPERATIONS RESEARCH
Vol. 28, No. 1, February 2003, pp. 194-200
DOI: 10.1287/moor.28.1.194.14255
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Golubin, A. Y.
Right arrow Search for Related Content

A Note on the Convergence of Policy Iteration in Markov Decision Processes with Compact Action Spaces

A. Y. Golubin

Department of Operations Research, Moscow Institute of Electronics and Mathematics, B. Trechsvjatitelsky per., 3/12, Moscow, 109028, Russia
io{at}miem.edu.ru

The undiscounted, unichain, finite state Markov decision process with compact action space is studied. We provide a counterexample for a result in Hordijk and Puterman (1987) and give an alternate proof of the convergence of policy iteration under the condition that there exists a state that is recurrent under every stationary policy. The analysis essentially uses a two-term matrix representation for the relative value vectors generated by policy iteration procedure.

Key Words: Markov decision processes; optimality equation; average reward; policy iteration
History: Received: February 11, 2000; revision received: March 10, 2001;revision received: January 25, 2002;revision received: June 24, 2002;





HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2003 by INFORMS.