MathInstitutes Dev

Tradeoffs in Contextual Bandit Learning

Presenter

Alekh Agarwal

May 16, 2016

IMA

Keywords:

contextual bandits

MSC:

91B06

Tradeoffs in Contextual Bandit Learning Thumbnail

Play Video

Abstract

The contextual bandit problem is a classic example of the statistical tradeoff between exploration and exploitation. Upon taking an action, a learner only observes the reward of that action, while no feedback on the other possible actions is received. In additional to the natural information tradeoff between experimenting with unknown actions and relying on known good actions, the problem also has some intriguing tradeoffs between computational and statistical considerations. In this talk, we will discuss two old and one new algorithms which span different points along this tradeoff. Without computational considerations, the EXP4 algorithm of Auer et al. (2002) solves the problem in its most general adversarial formulation. With suboptimal statistical guarantees, the Epoch-Greedy approach of Langford and Zhang (2008) gives essentially a computationally optimal approach. More recently, Agarwal et al. (2014) develop a new scheme which is statistically optimal in a weaker i.i.d. setting, and is also computationally efficient although more expensive than the Epoch-Greedy approach. We will further discuss the various trade-offs that arise in implementing this approach, along with the resulting theoretical considerations.

Videos

Tradeoffs in Contextual Bandit Learning

Presenter

Abstract