Video recording of the workshop
9:00-10:00 | Causal reasoning and Learning Systems. Leon Bottou, Jonas Peters, D. Max Chickering, Patrice Simard Due to a sudden personal reason Leon cannot attend to ICML this year. The talk will be given by Elon Portugaly. |
10:00-10:30 | On Bayesian Bandits Algorithms Émilie Kaufmann |
10:30-11:00 | Coffee break |
11:00-12:00 | Two basic problems in finite stochastic optimization Sebastien Bubeck |
12:00-12:15 | Dynamic Pricing with Limited Supply Moshe Babaioff,Shaddin Dughmi, Robert Kleinberg, Aleksandrs Slivkins download paper |
12:15-12:30 | Learning Hurdles for Sleeping Experts Varun Kanade, Thomas Steinke download paper |
12:30-12:45 | Contextual Bandits: Approximated Linear Bayes for Large Contexts Jose Antonio Martin H download paper |
12:45-14:00 | Lunch |
14:00-15:00 | The knowledge gradient for optimal learning Warren B. Powell |
15:00-15:10 | Introduction to the Exploration & Exploitation Challenge Jérémie Mary |
15:10-15:25 | Second team of the Phase 1 presentation Regulating Greed over Time for Yahoo! Front Page News Article Recommendations Cynthia Rudin, Virot Ta Chiraphadhanakul, Edward Su (MIT) |
15:25-15:45 | Winner of phase 1 presentation Upper Confidence Bound with Time Information on News Article Recommendations Ku-Chun Chou, Hsuan-Tien Lin (National Taiwan University) |
15:45-16:00 | Coffee break |
16:00-16:10 | Annoucement of winners of Phase 2 |
16:10-17:10 | Open discussion on future of exploration & exploitation |
You can find the last year proceedings (2011 edition) on JMLR Workshop series
]]>
Leon Bottou ,Jonas Peters, D. Max Chickering, Patrice Simard.
Title: Causal reasoning and Learning Systems.
Abstract: Using the ad placement problem as an example, we describe an
intricate link between causation and the interpretation of
experimental data in learning systems. Following the lead gives
(a) a “non-regret” perspective on exploration/exploitation decisions,
(b) means to leverage the fine structure of the causal graph,
(c) interesting differential techniques,
(d) plenty of open questions.
Title: On Bayesian Upper Confidence Bounds for Bandits Problems and Optimality of Thomson sampling
Abstract: We present here two algorithms based on Bayesian modelling of the MAB, that are proved to be optimal in term of frequentist regret. The first, Bayes-UCB uses a Bayesian Upper Confidence Bound based on quantiles of the posterior distribution as an index. The second is Thompson Sampling, a randomized Bayesian algorithm dating back to 1933 whose first finite-time regret analysis was proposed at COLT (2012) and for which we now have an optimal finite-time analysis.
Title: Two basic problems in finite stochastic optimization
Abstract: I will present a new theoretical perspective on two basic problems arising in stochastic optimization. The first one is arguably the most elementary problem in stochastic optimization: assume that one wants to find the maximum of function defined on a finite set, and that one is given a budget of n noisy evaluations. What is the best sequential allocation procedure for the evaluations?
The second problem that I will discuss is inspired from the issue of security analysis of a power system. We formalize this problem as follows: Let X be a set, and A a subset of X of “interesting” elements in X. One can access X only through requests to a finite set of probabilistic experts. More precisely, when one makes a request to the i^th expert, the latter draws independently at random a point from a fixed probability distribution P_i over X. One is interested in discovering rapidly as many elements of A as possible, by making sequential requests to the experts.
Title : The knowledge gradient for optimal learning
Abstract : The knowledge gradient is a policy for efficiently learning the best of a set of choices by maximizing the marginal value of information, a form of steepest ascent for a belief model. The policy has no tunable parameters, and has been adapted to both online (bandit) and offline (ranking and selection) problems. Versions of the knowledge gradient have been derived for both discrete and continuous alternatives, and a variety of Bayesian belief models including lookup tables with correlated beliefs, parametric models and two forms of nonparametric models. We have also derived the knowledge gradient for a robust (min-max) objective. We report on comparisons against popular policies such as Boltzmann, epsilon-greedy, interval estimation and UCB policies. We will discuss applications in drug discovery (a problem with 90,000 alternatives) and the tuning of parameters for black-box simulations (vector-valued continuous parameters). Additional information may be found at http://optimallearning.princeton.edu.
]]>Balancing exploration and exploitation is of high practical importance in large-scale applications on the web and also in information retrieval, where it is desired to help users quickly access the information they are looking for. This requires learning what users are interested in (exploration) and, based on these learnings, to serve adequate targeted content (exploitation).
The workshop will be single-day, made of invited talks and presentations of contributed work, with time for discussion. Depending on the quality and compatibility with the workshop objectives, slots for brief talks and posters will be allocated.
We invite contributions on the topic of exploration and exploitation in various domains including (but not limited to) bandit games, online optimisation, reinforcement learning, tree search, recommender systems, information retrieval, etc. Topics of interest include: applications and practical performance of techniques to trade exploration and exploitation, benchmark/comparison of different techniques, computational challenges in large scale applications, new techniques, theoretical analyses, etc.
Contributions should be communicated to the programme committee (the organisers) by way of an extended abstract (a pdf file, from 4 to 8 pages formatted in the ICML conference paper style), sent by email to jeremie.mary[at]inria.fr before May 7th 2012, midnight WST.
Dual Submissions Policy: we accept contributions already published if theses conditions are filled :
You should send us your best papers !!!
You can find the last year proceedings on JMLR Workshop series
]]>