Tentative schedule

Workshop took place on July 1st (second day of workshops) in FRN.

Video recording of the workshop


9:00-10:00 Causal reasoning and Learning Systems.
Leon Bottou, Jonas Peters, D. Max Chickering, Patrice Simard
Due to a sudden personal reason Leon cannot attend to ICML this
year. The talk will be given by Elon Portugaly.
10:00-10:30 On Bayesian Bandits Algorithms
Émilie Kaufmann
10:30-11:00 Coffee break
11:00-12:00 Two basic problems in finite stochastic optimization
Sebastien Bubeck
12:00-12:15 Dynamic Pricing with Limited Supply
Moshe Babaioff,Shaddin Dughmi, Robert Kleinberg, Aleksandrs Slivkins
download paper
12:15-12:30 Learning Hurdles for Sleeping Experts
Varun Kanade, Thomas Steinke
 download paper
12:30-12:45  Contextual Bandits: Approximated Linear Bayes for Large Contexts
Jose Antonio Martin H
download paper
12:45-14:00 Lunch
14:00-15:00 The knowledge gradient for optimal learning
Warren B. Powell
15:00-15:10 Introduction to the Exploration & Exploitation Challenge
Jérémie Mary
15:10-15:25 Second team of the Phase 1 presentation
Regulating Greed over Time for Yahoo! Front Page News Article

Cynthia Rudin,  Virot Ta Chiraphadhanakul, Edward Su (MIT)
15:25-15:45 Winner of phase 1 presentation
Upper Confidence Bound with Time Information on News Article

Ku-Chun Chou, Hsuan-Tien Lin (National Taiwan University)
15:45-16:00 Coffee break
16:00-16:10 Annoucement of winners of Phase 2
16:10-17:10 Open discussion on future of exploration & exploitation


You can find the last year proceedings (2011 edition) on  JMLR Workshop series


Confirmed invited speakers

(order is the same than in the schedule)

Leon Bottou ,Jonas Peters, D. Max Chickering, Patrice Simard.

TitleCausal reasoning and Learning Systems.

Abstract: Using the ad placement problem as an example, we describe an
intricate link between causation and the interpretation of
experimental data in learning systems. Following the lead gives
(a) a “non-regret” perspective on exploration/exploitation decisions,
(b) means to leverage the fine structure of the causal graph,
(c) interesting differential techniques,
(d) plenty of open questions.

Emilie Kaufmann

TitleOn Bayesian Upper Confidence Bounds for Bandits Problems and Optimality of Thomson sampling

Abstract: We present here two algorithms based on Bayesian modelling of the MAB, that are proved to be optimal in term of frequentist regret. The first, Bayes-UCB uses a Bayesian Upper Confidence Bound based on quantiles of the posterior distribution as an index. The second is Thompson Sampling, a randomized Bayesian algorithm dating back to 1933 whose first finite-time regret analysis was proposed at COLT (2012) and for which we now have an optimal finite-time analysis.

Sebastien Bubeck

Title: Two basic problems in finite stochastic optimization

Abstract: I will present a new theoretical perspective on two basic problems arising in stochastic optimization. The first one is arguably the most elementary problem in stochastic optimization: assume that one wants to find the maximum of function defined on a finite set, and that one is given a budget of n noisy evaluations. What is the best sequential allocation procedure for the evaluations?

The second problem that I will discuss is inspired from the issue of security analysis of a power system. We formalize this problem as follows: Let X be a set, and A a subset of X of “interesting” elements in X. One can access X only through requests to a finite set of probabilistic experts. More precisely, when one makes a request to the i^th expert, the latter draws independently at random a point from a fixed probability distribution P_i over X. One is interested in discovering rapidly as many elements of A as possible, by making sequential requests to the experts.


Warren B. Powell

Title : The knowledge gradient for optimal learning

Abstract : The knowledge gradient is a policy for efficiently learning the best of a set of choices by maximizing the marginal value of information, a form of steepest ascent for a belief model. The policy has no tunable parameters, and has been adapted to both online (bandit) and offline (ranking and selection) problems. Versions of the knowledge gradient have been derived for both discrete and continuous alternatives, and a variety of Bayesian belief models including lookup tables with correlated beliefs, parametric models and two forms of nonparametric models. We have also derived the knowledge gradient for a robust (min-max) objective. We report on comparisons against popular policies such as Boltzmann, epsilon-greedy, interval estimation and UCB policies. We will discuss applications in drug discovery (a problem with 90,000 alternatives) and the tuning of parameters for black-box simulations (vector-valued continuous parameters). Additional information may be found at http://optimallearning.princeton.edu.

Call for papers

We think that three new kinds of challenges are more and more important in Exploration and Exploitation :

  • Contextual bandit problems with large number of contexts and (unknown) correlations between arms with respect to the context,
  • Numerical difficulties due to large scale problems, with the need for very fast, robust algorithms able to learn and reply in a few milliseconds per query on a standard computer,
  • New theoretical breakthroughs in the analysis of bandit algorithms.

Balancing exploration and exploitation is of high practical importance in large-scale applications on the web and also in information retrieval, where it is desired to help users quickly access the information they are looking for. This requires learning what users are interested in (exploration) and, based on these learnings, to serve adequate targeted content (exploitation).

The workshop will be single-day, made  of invited talks and presentations of contributed work, with time for discussion. Depending on the quality and compatibility with the workshop objectives, slots for brief talks and posters will be allocated.

We invite contributions on the topic of exploration and exploitation in various domains including (but not limited to) bandit games, online optimisation, reinforcement learning, tree search, recommender systems, information retrieval, etc. Topics of interest include: applications and practical performance of techniques to trade exploration and exploitation, benchmark/comparison of different techniques, computational challenges in large scale applications, new techniques, theoretical analyses, etc.

Contributions should be communicated to the programme committee (the organisers) by way of an extended abstract (a pdf file, from 4 to 8 pages formatted in the ICML conference paper style), sent by email to jeremie.mary[at]inria.fr before May 7th 2012, midnight WST.

Dual Submissions Policy: we accept contributions already published if theses conditions are filled :

  • First publication of the work in the previous year
  • If the work is also an ICML’12 paper then the workshop paper must emphases the link between the paper and the challenge.

You should send us your best papers !!!

You can find the last year proceedings on  JMLR Workshop series