Mathematics of Operations Research
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


MATHEMATICS OF OPERATIONS RESEARCH
Vol. 34, No. 2, May 2009, pp. 287-302
DOI: 10.1287/moor.1080.0371
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Krishnamurthy, V.
Right arrow Articles by Wahlberg, B.
Right arrow Search for Related Content

Partially Observed Markov Decision Process Multiarmed Bandits—Structural Results

Vikram Krishnamurthy, Bo Wahlberg

Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
Automatic Control and ACCESS, School of Electrical Engineering, KTH, SE-100 44 Stockholm, Sweden

vikramk{at}ece.ubc.ca, http://www.ece.ubc.ca/~vikramk
bo.wahlberg{at}ee.kth.se

This paper considers multiarmed bandit problems involving partially observed Markov decision processes (POMDPs). We show how the Gittins index for the optimal scheduling policy can be computed by a value iteration algorithm on each process, thereby considerably simplifying the computational cost. A suboptimal value iteration algorithm based on Lovejoy's approximation is presented. We then show that for the case of totally positive of order 2 (TP2) transition probability matrices and monotone likelihood ratio (MLR) ordered observation probabilities, the Gittins index is MLR increasing in the information state. Algorithms that exploit this structure are then presented.

Key Words: multiarmed bandits; partially observed Markov decision process; monotone policies; likelihood ratio ordering; opportunistic scheduling; stochastic approximation algorithm
History: Received: July 24, 2008; revision received: December 3, 2008;





HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2009 by INFORMS.