Mathematics of Operations Research
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


MATHEMATICS OF OPERATIONS RESEARCH
Vol. 28, No. 2, May 2003, pp. 327-345
DOI: 10.1287/moor.28.2.327.14483
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Mannor, S.
Right arrow Articles by Shimkin, N.
Right arrow Search for Related Content

The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes

Shie Mannor, Nahum Shimkin

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Department of Electrical Engineering, Technion, Israel Institute of Technology, Haifa 32000, Israel

shie{at}mit.edu
shimkin{at}ee.technion.ac.il

This paper proposes an extension of the regret minimizing framework from repeated matrix games to stochastic game models, under appropriate recurrence conditions. A decision maker, P1, who wishes to maximize his long-term average reward is facing a Markovian environment, which may also be affected by arbitrary actions of other agents. The latter are collectively modeled as a second player, P2, whose strategy is arbitrary. Both states and actions are fully observed by both players. While P1 may obviously secure the min-max value of the game, he may wish to improve on that when the opponent is not playing a worst-case strategy. For repeated matrix games, an achievable goal is presented by the Bayes envelope, that traces P1's best-response payoff against the observable frequencies of P2's actions. We propose a generalization to the stochastic game framework, under recurrence conditions that amount to fixed-state reachability. The empirical Bayes envelope (EBE) is defined as P1's best-response payoff against the stationary strategies of P2 that agree with the observed state-action frequencies. Because the EBE may not be attainable in general, we consider its lower convex hull, the convex Bayes envelope (CBE), which is proved to be achievable by P1. The analysis relies on Blackwell's approachability theory. The CBE is lower bounded by the value of the game and for irreducible games turns out to be strictly above the value whenever P2's frequencies deviate from a worst-case strategy. In the special case of single-controller games where P2 alone affects the state transitions, the EBE itself is shown to be attainable.

Key Words: Bayes envelope; controlled Markov processes; stochastic games; regret minimization; approachability
History: Received: October 9, 2001; revision received: November 29, 2001;revision received: July 2, 2002;


This article has been cited by other articles:


Home page
Mathematics of Operations ResearchHome page
J. Y. Yu, S. Mannor, and N. Shimkin
Markov Decision Processes with Arbitrary Reward Processes
Mathematics of Operations Research, August 1, 2009; 34(3): 737 - 757.
[Abstract] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2003 by INFORMS.