Python version

Thanks to the work of José Antonio Martín we are now able to propose you to enter the competition using  Python. Thanks again to him :-)


All you have to do is to download the archive exploChallenge-Python, extract it and follow the readme.txt. You have to implement the MyPolicy class. To submit your algorithm, use the same form than for java files (you will have to provide a .zip file embedding your Python code).


Version of Python on the cluster is 2.6.5 with numpy 1.3.0 and SciPy 0.7.0. The Python code needs slightly more time than the java one. For a very basic policy on the first part of the data, Python needs 45 min and Java needs 26 min.


Edit May 03th: A newer of python is now available (python 2.7, numpy, scipy ). To use it you have to replace python exploChallenge/ by python2.7 exploChallenge/ in before building your submission zip. This version of python have not been extensively tested (and some modules could be missing) so do not hesitate to report bugs.

The evaluation process

The goal of this challenge is to build an algorithm that learns efficiently a policy to serve news articles on a web site. At each iteration of the evaluation process, you will be asked to pick an article from a list given a visitor (136 binary features + a timestamp). To build a smart algorithm, you might want to balance carefully exploration and exploitation and pay close attention to the “age” of the news articles (among other things of course). A quick look on the leaderboard is enough to figure out why that last point matters. It is the overall CTR (click through rate) of your algorithm that will be taken into account to rank it on the leaderboard.

Since the evaluation process is based on data, all your choices cannot be evaluated (simply because they don’t necessarily appear in the dataset). Please have a look at the following details to fully understand how this fact should impact the design of your algorithm.

The details

The evaluation of your algorithm will be performed on the 920.000 first lines of the dataset. The constructor of your policy will be called before that.

public MyPolicy() ;

Choosing an article
At each iteration, the evaluator will ask you to choose an article to display to a visitor by the way of the following method.

public YahooArticle getActionToPerform(YahooVisitor visitor,
List possibleActions) ;

A visitor is characterized by a 136-long binary vector and a time. The method visitor.getFeatures() returns an array containing them. You also have access to a timestamp via the method visitor.getTimestamp().
NB: The method must always return an article from the list. If it doesn’t return one or returns another one, an exception will be raised and the evaluation stopped.

Updating the policy
If and only if you have chosen the same article as in the data, the evaluator allows you to update your policy using the current line of the dataset via this method:

public void updatePolicy(YahooVisitor c, YahooArticle a, Boolean reward);

NB: In case you are wondering why you cannot learn from every line: It’s because if you could, exploration would be useless!

An iteration is evaluated only if you have chosen the same article as the one in the data (same as above). As you choose among around 30 articles and that a random policy was used to generate the data, this should happen roughly 9M/30 = 300k times. Your final score is the number of times you got a click divided by the number of times you chose the same ad as in the data. This evaluation process is very close from being unbiased (see this paper for theoretical and empirical proofs).

The evaluation of your algorithm must run in less than 36 hours. Passed that limit, the evaluation process will be killed. Note that a random policy runs in 30min. It lets you plenty of computationnal time. If an error (java exception) occurs during the evaluation, the process will also be stopped.

Programs that terminates before the end of the data (not answering “correctly” at least at random) receives a value but are ranked at the end of the leaderboard with a mention “Timeout”. You can use this behavior to early stop your algorithm and test quickly a new idea.

About the 100 given lines

The rewards, the possible choices and the attributes have been modified so your should not rely on this data except for testing purposes. The resulting click rate is roughly 0.5 (in the whole dataset the average click rate is around 0.03) and you have less articles available in order to make the testing process more relevant (as far as code testing is concerned of course).

Some remarks

Some remarks on the task:

  • The context is a set of 136 binary features
  • You usually have to choose you favorite arm in a set of 30 arms
  • The clickthrough rate (CTR) is not constant over time, you can try to take advantage of it.
  • There is a total of 30 millions of records. 10 millions are used for the first part of the challenge.
  • The record will be used for evaluation of your algorithm only 1/30 of the time (data are collected with a random policy). To fulfill the time constraint this means you should be able to choose an arm within 5 ms and to update your policy within 50 ms when receiving feedback.
  • The total number of documents (ie articles displayed) in the first part of the challenge is 246 (timestamp 1317513291 to 1317945293).  The total number of documents in the full dataset is 652 (timestamp 1317513291 to 1318809293).
  • Data are presented in chronological order. So between two  consecutive users possible choices tends to be same but evolves over time.
  • Evaluation is done on our servers to simulate online learning and avoid the use of some offline methods on the whole set of data. If for test purpose you want more datas consider the R6 data set from Yahoo!
  • In phase 1, winners will be known at the beginning of June, these winners are strongly encouraged to present their work at the workshop.
  • Phase 2 results will be known only at the workshop, it will be the same procedure of evaluation but with more (and new) data. Participants cannot submit any new algorithm, we will use their best submission of phase 1.
  • If you dislike Java, you can contact us and rewrite the evaluator under GPL license (specs will be provided). After reviewing your code, we may accept it and allow a new language of coding for all the participants. Time limit will remain the same whatever the language.
  • After the challenge the data will be made available through the Yahoo! Webscope program.

Getting started

First you have to suscribe

How to install everything ?

First you need java. Version of the cluster is :

java version “1.6.0_24″
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)

If you are using java 1.5 it will lead to crashes.

Download this archive and extract it. It contains everything you need to take part in the challenge :
- The source code in java in which you have to write two functions.
- The 100 first lines of the data so that you can test your implementation.

We recommend to use Eclipse to work on this project because it is a very powerful tool to write code in java. However if you are used to coding with something else, don’t worry, everything will be as simple.

With Eclipse :
If you don’t already have it, you can download eclipse here.
To open the project click on File > New > Java Project. Then untick the Use default location box and click on the Browse button. Select the ExploChallenge directory that was in the archive.

If you cannot see the Package Explorer tab, open it by clicking on Window > Show view > Package Explorer. In this explorer, you can browse the source code of the project. A double click on the class MyPolicy of the package myPolicy displays the class you have to fill in in order to participate.

Without Eclipse :
Nothing special to do here. The source code is in the src/ directory. The class MyPolicy of the package myPolicy is the one you have to fill in in order to participate. You will need ant (kind of an equivalent of make for java) especially if you plan on using external libraries later.

How to implement your solution ?

These are the methods of the class you have to fill in :

public MyPolicy() {
//Any initialization of your algorithm should be done here.
public YahooArticle getActionToPerform(YahooVisitor visitor,
List possibleActions) {
//Given a visitor, you have to choose the "best" article in the list.
return possibleActions.get(0); //here we choose the first article!
public void updatePolicy(YahooVisitor c, YahooArticle a, Boolean reward) {
//update your policy given the visitor, the displayed article and
//the associated reward (click or not click)

You can of course create new classes/packages and use them in this one. However do not write any code in the exploChallenge package (no new classes, no modifications) because this code won’t be taken into account on our cluster. You are given a modified version of the 100 first lines of the dataset in order to test your code. Please make sure that your code is at least working on this very small subset of the data before submitting it.

With Eclipse :


Compilation is automatic. You will be notified by way of small icons as soon as you write incorrect java code. Clicking on the icon will suggest corrections. To run the test, just click on the run button (see picture) or use Ctrl F11. The result will be displayed in the Console tab. If you cannot see it : Window > Show View > Console

Without Eclipse :
We supply a build.xml file which is to ant what the Makefile is to Make. The command ant compile compiles the project. ant run runs the test.

How to submit an algorithm

You have to submit here a jar file built from the project.

With Eclipse :
Display the ant tab with Window > Show View > Ant . Then drag the build.xml file from the package explorer to the ant tab. This procedure only has to be done once. 4 targets are available. The target jar builds a jar called submission.jar at the root of the project. clean and compile are useless under Eclipse and run can be used to run the test.

An alternative way is to right click on the project in the package explorer and then Export… > Java > Runnable JAR File. In the window that appears, select the following options :
Launch configuration: Main – ExploChallenge
Export destination: where you want to generate the jar.
Library handling: Extract required libraries into generated JAR

Without Eclipse :
The command ant jar builds a jar called submission.jar at the root of the project.

How to use an external library

In order to allow Eclipse to find the library you are using, you shoud right click on your project, then Build Path > Configure Build Path…. In the window that appears, click on Add External JARs and select the jar of your library. If you are using ant (even with Eclipse) you also have to make a copy of your library in the lib directory of the project.

A few rules concerning submissions

Your submissions are NOT allowed to log any part of the data or to access the hidden parts of the data (even for debugging purposes). All the files and outputs are kept on our servers and will be automatically and manually inspected. Any attempt of cheating will lead to immediate and definitive exclusion.

Related publications

The Pascal-2 Exploration and Exploitation Challenge 2011 presents a problem similar to web advertising as presented by [1], [2], and [3]. It can be seen as a bandit problem in which arms are (visitor, ad) pairs and rewards are binary (1 when click, 0 otherwise). Recent theoretical results on (possibly generalized) linear bandit algorithms ([12] ,[13]) Gaussian Process Optimization for trading exploration and exploitation have sparked interest in Bayesian approaches ([6], [4] and [11]) or KL-bandits ([5] and [10]), but for web-scale applications the computational cost is critical (as well as performance!).

There is also some new interest in Thompson sampling for such applications brought to the attention of the community by [7] and with a beginning of theoretical analysis by [8].

[1] Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms, Lihong. Li, Wei Chu, John Langford &  Xuanhui Wang. WSDM, 2011.

[2] Multi-armed Bandit Problems with Dependent Arms, Sandeep Pandey, Deepayan Chakrabarti & Deepak Agarwal , ICML 2007

[3] Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine, Thore Graepel, Joaquin Quinonero Candela, Thomas Borchert, Ralf Herbrich, ICML 2010

[4] Regret Bounds for Gaussian Process Bandit Problems, Steffen Grunewalder, Jean-Yves Audibert, Manfred Opper & John Shawe-Taylor, AISTATS 2010

[5] The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond, by Aurelien Garivier & Olivier Cappe, arXiv:1102.2490

[6] Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design, Niranjan Srinivas, Andreas Krause, Sham M. Kakade, Matthias Seeger, ICML 2010

[7] An Empirical Evaluation of Thompson Sampling, Olivier Chapelle & Lihong Li, NIPS11

[8] Analysis of Thompson Sampling for the multi-armed bandit problem, Shipra Agrawal & Navin Goyal, arXiv:1111.1797v2

[9] ICML Exploration and Exploitation challenge: Keep it Simple !, Olivier Nicol, Jeremie Mary & Philippe Preux, Journal of Machine Learning Research (JMLR), 2012 (accepted, to appear)

[10] Finite-time analysis of multi-armed bandits problems with Kullback-Leibler divergences. Odalric-Ambrym Maillard, Remi Munos & Gilles Stoltz. In Conference On Learning Theory, 2011

[11] On Bayesian Upper Confidence Bounds for Bandit Problems E. Kaufmann, Olivier Cappe, & Aurélien Garivier. AISTATS 2012.

[12] Parametric bandits : The generalized linear case. Sarah Filippi, Olivier Cappé, Aurélien Garivier, & Csaba Szepesvari, NIPS, 2010.

[13] Improved Algorithms for Linear Stochastic Bandits (extended version), Yasin Abbasi-Yadkori, David Pál, & Csaba Szepesvári NIPS, 2011

Raw data description

Each line in the files corresponds to a user visit. An example line is as follows:

1317513291 id-560620 0 |user 1 9 11 13 23 16 18 17 19 15 43 14 39 30 66 50 27 104 20 |id-552077 |id-555224 |id-555528 |id-559744 |id-559855 |id-560290 |id-560518 |id-560620 |id-563115 |id-563582 |id-563643 |id-563787 |id-563846 |id-563938 |id-564335 |id-564418 |id-564604 |id-565364 |id-565479 |id-565515 |id-565533 |id-565561 |id-565589 |id-565648 |id-565747 |id-565822

which contains the following fields delimited with spaces:

* timestamp: e.g., 1317513291
* displayed_article_id: e.g., id-560620
* user_click (0 for no-click and 1 for click): e.g., 0
* string “|user” indicates the start of user features
* user features are 136-dimensional binary vectors; the IDs of nonzero features are listed after the string “|user”
* The pool of available articles for recommendation for each user visit is the set of articles that appear in that line of data. All user IDs (bcookies in our data) are replaced by a common string “user”.

Note that each user is associated with a 136-dimensional binary feature vector. Features IDs take integer values in {1,2,…,136}. Feature #1 is the constant (always 1) feature, and features #2-136 correspond to other user information such as age, gender, and behavior-targeting features, etc. Some user features are not present, since not all users logged in to Yahoo! when they visited the front page.

A unique property of this data set is that the displayed article is chosen uniformly at random from the candidate article pool. Therefore, one can use an unbiased offline evaluation method [1] to compare bandit algorithms in a reliable way.

Related publications for further information:
* Evaluation methodology:
* Reference performance:
* First version of the data, collected for a different period of time: (“R6″)