Mathematics of Operations Research
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


MATHEMATICS OF OPERATIONS RESEARCH
Vol. 34, No. 1, February 2009, pp. 26-44
DOI: 10.1287/moor.1080.0342
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Glazebrook, K. D.
Right arrow Articles by Minty, R.
Right arrow Search for Related Content

A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements

K. D. Glazebrook, R. Minty

Department of Mathematics and Statistics, Lancaster University, Lancaster, LA1 4YF, United Kingdom
Department of Mathematics and Statistics, Lancaster University, Lancaster, LA1 4YF, United Kingdom

k.glazebrook{at}lancaster.ac.uk
r.minty{at}lancaster.ac.uk

We generalise classical multiarmed bandits to allow for the distribution of a (fixed amount of a) divisible resource among the constituent bandits at each decision point. Bandit activation consumes amounts of the available resource, which may vary by bandit and state. Any collection of bandits may be activated at any decision epoch, provided they do not consume more resource than is available. We propose suitable bandit indices that reduce to those proposed by Gittins [Gittins, J. C. 1979. Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc. B41 148–177] for the classical models. The index that emerges is an elegant generalization of the Gittins index, which measures in a natural way the reward earnable from a bandit per unit of resource consumed. The paper discusses both how such indices may be computed and how they may be used to construct heuristics for resource distribution. We also describe how to develop bounds on the closeness to optimality of index heuristics and demonstrate a form of asymptotic optimality for a greedy index heuristic in a class of simple models. A numerical study testifies to the strong performance of a weighted index heuristic.

Key Words: asymptotic optimality; bandit problems; dynamic programming; Gittins index; resource allocation
History: Received: January 31, 2007; revision received: May 3, 2008;





HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2009 by INFORMS.