Related publications

The Pascal-2 Exploration and Exploitation Challenge 2011 presents a problem similar to web advertising as presented by [1], [2], and [3]. It can be seen as a bandit problem in which arms are (visitor, ad) pairs and rewards are binary (1 when click, 0 otherwise). Recent theoretical results on (possibly generalized) linear bandit algorithms ([12] ,[13]) Gaussian Process Optimization for trading exploration and exploitation have sparked interest in Bayesian approaches ([6], [4] and [11]) or KL-bandits ([5] and [10]), but for web-scale applications the computational cost is critical (as well as performance!).

There is also some new interest in Thompson sampling for such applications brought to the attention of the community by [7] and with a beginning of theoretical analysis by [8].

[1] Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms, Lihong. Li, Wei Chu, John Langford &  Xuanhui Wang. WSDM, 2011.

[2] Multi-armed Bandit Problems with Dependent Arms, Sandeep Pandey, Deepayan Chakrabarti & Deepak Agarwal , ICML 2007

[3] Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine, Thore Graepel, Joaquin Quinonero Candela, Thomas Borchert, Ralf Herbrich, ICML 2010

[4] Regret Bounds for Gaussian Process Bandit Problems, Steffen Grunewalder, Jean-Yves Audibert, Manfred Opper & John Shawe-Taylor, AISTATS 2010

[5] The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond, by Aurelien Garivier & Olivier Cappe, arXiv:1102.2490

[6] Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design, Niranjan Srinivas, Andreas Krause, Sham M. Kakade, Matthias Seeger, ICML 2010

[7] An Empirical Evaluation of Thompson Sampling, Olivier Chapelle & Lihong Li, NIPS11

[8] Analysis of Thompson Sampling for the multi-armed bandit problem, Shipra Agrawal & Navin Goyal, arXiv:1111.1797v2

[9] ICML Exploration and Exploitation challenge: Keep it Simple !, Olivier Nicol, Jeremie Mary & Philippe Preux, Journal of Machine Learning Research (JMLR), 2012 (accepted, to appear)

[10] Finite-time analysis of multi-armed bandits problems with Kullback-Leibler divergences. Odalric-Ambrym Maillard, Remi Munos & Gilles Stoltz. In Conference On Learning Theory, 2011

[11] On Bayesian Upper Confidence Bounds for Bandit Problems E. Kaufmann, Olivier Cappe, & Aurélien Garivier. AISTATS 2012.

[12] Parametric bandits : The generalized linear case. Sarah Filippi, Olivier Cappé, Aurélien Garivier, & Csaba Szepesvari, NIPS, 2010.

[13] Improved Algorithms for Linear Stochastic Bandits (extended version), Yasin Abbasi-Yadkori, David Pál, & Csaba Szepesvári NIPS, 2011