Mathematics of Operations Research
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


MATHEMATICS OF OPERATIONS RESEARCH
Vol. 33, No. 3, August 2008, pp. 513-528
DOI: 10.1287/moor.1080.0312
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Lugosi, G.
Right arrow Articles by Stoltz, G.
Right arrow Search for Related Content

Strategies for Prediction Under Imperfect Monitoring

Gábor Lugosi, Shie Mannor, Gilles Stoltz

ICREA and Department of Economics, Pompeu Fabra University, Barcelona, Spain
Department of Electrical and Computer Engineering, McGill University, Montreal, Québec, Canada
Département de Mathématiques et Applications, Ecole Normale Supérieure, CNRS, Paris, France, and HEC Paris School of Management, CNRS, Jouy-en-Josas, France

lugosi{at}upf.es, http://www.econ.upf.es/~lugosi
shie.mannor{at}mcgill.ca, http://www.ece.mcgill.ca/~smanno1/
gilles.stoltz{at}ens.fr, http://www.dma.ens.fr/~stoltz

We propose simple randomized strategies for sequential decision (or prediction) under imperfect monitoring, that is, when the decision maker (forecaster) does not have access to the past outcomes but rather to a feedback signal. The proposed strategies are consistent in the sense that they achieve, asymptotically, the best-possible average reward among all fixed actions. It was Rustichini [Rustichini, A. 1999. Minimizing regret: The general case. Games Econom. Behav. 29 224–243] who first proved the existence of such consistent predictors. The forecasters presented here offer the first constructive proof of consistency. Moreover, the proposed algorithms are computationally efficient. We also establish upper bounds for the rates of convergence. In the case of deterministic feedback signals, these rates are optimal up to logarithmic terms.

Key Words: repeated games; regret; Hannan consistency; imperfect monitoring; online learning
History: Received: May 29, 2007; revision received: December 20, 2007;





HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2008 by INFORMS.