Feel free to ask and discuss anything about the challenge !

105 thoughts on “Challenge discussion

  1. Hi all, I am little confused with the problem itself and the datasets.

    What is the context free problem to solve?
    What is the context dependent problem to solve?

    I mean, should I try to learn a good policy to recommend articles ? or clicks are at random and I need to learn just only the right balance between exploration and exploitation?


    • Imagine you have a visit on your website and you can choose to highlight one of your articles. The user is described (136 features). You are rewarded if the article you choose is clicked.

      Newest articles tends to perform better (that’s why “always last” performs better than random).

      Some articles tends to perform better because they are better (that’s why UCB like algorithms without having any usage of user description also performs better than random).

      I guess one could combine the two first points in a clever way to have quite good performance without using the user description.

      Some articles performs better on some kinds of users (context dependency). That’s the hardest part.

  2. Hi Jeremie, thanks for your answer. good hints.
    But I have more questions.

    As I understand, I am not rewarded for selecting an article that the user has clicked but instead that is on the dataset. In this sense what I am learning?

    And then, only If I select the same article as in the data (who selected this article?) then I can learn if a user has clicked or not on that article.

    What is the connection between the selection of the articles at the presentation and the user click?

    Thanks a lot,

  3. Ok let me try:

    * I should always try to select the right article because if just by chance this is the chosen one (with 1/30 uniform probability?) then the probability to be rewarded is maximized?

    However I think this does explain only the beginning. Is that right?


    • I’m not sure to understand the part about probability maximization, but I could agree :)

      If your policy is to always choose the last element of the list, this basically means your are always choosing the most recent article.

      If by chance it was also chosen by the sample collection strategy (which is random uniform) then you will score the corresponding 0 or 1. If not this round of evaluation is discarded.

  4. how can differentiate individual users? if for instance, i dont want to recommend an article that the user has already clicked. can i assume that the 136 bit context description is unique per user? or if i choose an article again for the user, will the evaluation count it as a click because at some point in the history, this same user did click on the article.

  5. Ok. As I see it now. There are two pontis of view:

    1) I can forget about click at all! I just only need to learn the pseudorandom distribution of the selected articles and these will give me optimal returns.

    2) I can forget about selection at all! I just only need to learn the right articles (the users clicked on) so when by random choice (1/30) the article matchs the dataset I maximize the probability to get a 1 reward and so maximize return.

    What do you think?

    • 1) No. Even if you guess the peudorandom (good luck) and always choose the article, your reward will be the average value of click (ie 0.0366 so a score of 366)

      2) Yes and no. If all users have the same behavior (ie context/description of users is not correlated to click probability) then it’s true. There is strong evidence for this assumption to be false (read http://dl.acm.org/citation.cfm?doid=1772690.1772758 ). Selecting the right article is the object of this challenge.

      • as we don’t know anything other than ID about the article , context of users seems useless to me. So I didn’t use context information in my approach ,actually my approach works well.

  6. Dear Jeremie, sorry for my emails, I think I am approaching to understand the point, but please be patient with me ;-)

    Reading the evaluation text of the challenge it says that:

    evaluation = cr / hr,
    cr: is the number of times you got a click (click-rate)
    hr: is the number of times you chose the same ad as in the data hit-rate.

    So, lets consider the four possiblities (2×2 combiations of events):

    event A: you chose the same ad as in the data: match 1 nomatch 0
    event B: after A; you got a click 1 or not 0

    A0 and B*: nothing happends to your return (0) since evaluation does not change

    A1B0: penalty since hr increases but cr remains constant

    A1B1: reward? cr increase by 1 but also hr, so?

    So, in principle you should try to avoid A1B0 event combinations, that is, avoid making bad recommendations.

    However, is this not perhaps a biased system in favor of avoiding bad recomendations against good ones?

    • Right. Avoiding making bad recommandations would be cool. But how to do it ?
      In the dataset the choice of the article is independant of the user: this is a random uniform policy.

      You can do it trying to find the best possible association between user and probability of click: that’s fine that’s our goal.

      You can also try to do it by trying to not chose the same article each time you see a bad user (well in fact you need to be able to recognize a bad user or to have a first initial click and then alway play something else than what is logged). In that case you need to be able to find the seeds of the random number generators. You don’t know the generative method (probably close to a Mersene Twister), you dont know what is the chosen action (only if it matches your choice). You are not allowed to log in order to identify it over several evalautions…

      Seed identification doesn’t sound reasonable… and this is not the goal. Moreover final evaluation will be with a different dataset, so it is useless to guess the seeds on the first datas.

      To score a maximum it would be simpler to scan the memory to found the dataset and read the right answers (and it’s explicitly forbidden)…

  7. Hi Jeremie, thank you again for your great clarifications!

    I think I now understand ( hope so ;-) )

    Unless I have an Oracle that tell me if an article will match the logging policy, I should select the article with higher probability to be clicked. So I have to learn such probabilities using the available information, i.e., clicks-feedback and visitor features.

    However, I am an Oracle certified professional, so, expect the unexpected! @_@

  8. Any guidelines for the acceptable number of simultaneous submissions? So far I have been assuming that 3-5 is OK but that more than 10 is bad.

    Unfortunately most of the algorithms I can think of for this problem have at least one free parameter, and (by design) there is no real offline data.


    • 3-5 seems fair. In fact the “real rule” is that i dedicated 16 cores on the cluster for evaluations and I’d want that most of the jobs do not stay “pending” more than 30 minutes. If it’s happen I’ll have to limit the number of submission.

  9. What does it mean when a submission is marked incomplete and there is nothing more in the error/logs file?

    I would like to know if the problem was due to
    a) timing problem
    b) memory usage
    c) program ends before input is complete

    • It can be a) : the process is stopped with no error message)
      It cannot be b) : it would lead to an error
      It can be c) : if your program exits, no error message is reported

      Option d) is you modified the log process and you added something at the end of the log file

  10. I have the following error from time to time.

    Exception in thread “main” java.lang.NoClassDefFoundError: myPolicy/MyPolicy
    at exploChallenge.MainCluster.main(Unknown Source)
    Caused by: java.lang.ClassNotFoundException: myPolicy.MyPolicy
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    … 1 more

    I am wondering is an issue on the evaluation clusters or is it my code?

    • In that case first thing to check is your java version : it must be 1.6

      Edit : In that case it was linked to a disk management problem on cluster. Contact me if this happens again, I’ll do my best to fix.

  11. Is it possible to add an option to add a private comment to a submission so we can tag the submission with some metadata abt the algorithm and the parameter values we used etc. just some text that can be associated with a particular submission and is visible only to the contestant who submitted?

    • Humm I’ll think about it but with the current version this is not straightforward to implement this feature (but not impossible). Some participants already do that kind of stuff with the name of their submission (eg submission_algoname_parameters.jar ). Of course if you have 20 parameters, it’s not a very suitable solution.

      Edit : When/where do u want to be able to add theses comments? At submission time?

    • Hi –
      Below is how I did this. Since its easy to get back the *.jar file from various runs; you can download the jar you want, and run a main() method you bury in MyPolicy: java -cp good.jar myPolicy/MyPolicy

      package myPolicy;
      import java.util.List;
      import exploChallenge.logs.yahoo.YahooArticle;
      import exploChallenge.logs.yahoo.YahooVisitor;
      import exploChallenge.policies.ContextualBanditPolicy;

      public class MyPolicy implements
      ContextualBanditPolicy {
      private ContextualBanditPolicy implementation;

      public MyPolicy(){
      // implementation = new SimplePolicy();
      // implementation = new YoungestPolicy();
      // . . .
      implementation = new BestPolicyEver();

      public YahooArticle getActionToPerform(YahooVisitor visitor,
      List possibleArticles) {
      return implementation.getActionToPerform(visitor, possibleArticles);

      public void updatePolicy(YahooVisitor visitor, YahooArticle article, Boolean reward) {
      implementation.updatePolicy(visitor, article, reward);

      public static void main(String[] args) {
      MyPolicy policy = new MyPolicy();


    • I guess this is a little bit slower because each process want to access the data (disk acces) and there is a quite large number of running process at the same time. In the submission webpage I added a cluster charge indicator to provide a feedback of the charge.

      The run for victory (on new data after 1st of june) will be done alone on the cluster if the submitted algo have some time limit problems.

      precision : sometimes when cluster load is orange (9 to 14) your jobs can be sent to the cluster but do not start computation immediately. When the color is red then your process will have to wait. Of course, this extra wait time is not counted in the timelimit.

  12. Hi Jeremie,

    Another feature request. I dont know if others are also facing this stupid problem. I seem to notice a bug in my code almost immediately after I submit my solution :(

    Can we have a way of killing our own currently running tasks somehow. I feel guilty about submitting almost immediately again with a small change and the submission with a bug is also running using up valuable server resources and blocking others as well.


    • Well… I was waiting for a such request (in last year challenge we was always facing that kind of problem). This is not a straightforward implementation (because of separations between web servers and cluster).
      I’ll think about doing something, but as it is really a lot of mess, u can also ask me for killing a job (providing the full reference of the job as stated in the submission mail xx%yyyyy-zzzz.{zip,jar} (If i’m fed up my manual killing i’ll write an automatic solution ;-) )

  13. If your bug leads to a java exception (which is not caught) the execution will stop and your algorithm won’t waste any resources. If it doesn’t, you can add a few tests in your code and throw an Error as follows if you notice an unexpected behavior.

    throw new Error("division by 0 !");
    double a = t / n;

  14. Hi, Jeremie,

    Could you please confirm that the feature #1 of user feature vectors is always 1? I found that not the case for the 100 test cases given. Thanks.

    • Do not try to gather information in this sample datas : click or not click has been modified, choice possibilities modified and some values attribute too. They are here to show u the shape no more.

      Edit : I looked more carefully, you are right feature 1 is always active and on some lines there is very few other feature activated in the same time. So without any guantee I would say when a feature is not present this can be because this is a “missing” value.

  15. Hi, it would be possible to get within the information of a submission some kind of indication about the CPU time or elapsed time used by the submission? So we can get how much time the algorithm consumed and the remaining time to the time limit?

    It will help a lot


  16. quick question: do all contest participants have to submit a writeup to the workshop before the may 7 workshop submission deadline? or is the contest winners’ presentation a separate item on the agenda? also, how many of the top contesting teams are invited for a presentation?

    • No workshop papers are different from the challenge.
      The possibility to do a presentation and to write a challenge paper will (probably) be given to the 3 best submissions. It also depends on the originality of the contributions.

      About prizes and invitation with paid inscription to ICML, I’m waiting to have more definitive information about sponsorship and amounts needed.

  17. Some of my recent submissions have been returning with errors unexpectedly. They are the same as other submissions that returned successfully with different parameter values so I am pretty sure the error is on the server end. Thought I should let you all know.


    • They are errors of this type:

      /bin/bash: line 0: cd: 59%KF_2_0001-fnf11j8cjko: No such file or directory

      Unfortunately I also submitted a file with an actual syntax error, so don’t get thrown off by that.

    • Yes u’re right… I saw them. I’m working on it, but I do not undersand what is happening. This is something quite rare which seems to happens with higher probability when there is more than 7-8 jobs on the cluster node. I suspect some webdav synchronization issues.

  18. Hi, how about the order in which the features appear in the dataset? In the logreader the order is lost when converted to a binary vector, but in the dataset the features are not trivially ordered.

    • No, i do not have any information about this. But you are right in the dataset it doesn’t seems ordered (but I don’t know if its related to a trust value of an order of apparition for this user)

  19. Some clarification questions, answer what you can:

    a) Can you tell us anything about whether the number of times an arm is selectable is approximately uniform? In other words can you say anything about the possibility that one arm will be one of the ~30 selectable arms 3 million times while another is selectable only 300,000 times?

    b) The articles have a 6-digit id, e.g. 560620 . The visitors have a 10-digit timestamp, e.g. 1317513291. Are the article ids just timestamps with ’1317′ removed? This seems to be suggested in the ‘some remarks’ post. That means that for the test data the visitor timestamp is always /smaller/ than the article timestamp. Can you tell us if this is true in the actual data?

    c) You also say “So between two consecutive users possible choices tends to be same but evolves over time.” I think this means that articles with smaller ids will be available towards the beginning of the evaluation, and articles with larger ids will be available towards the end, but that it /isn’t/ necessarily true that at the very beginning I will have the 30 smallest ids and at the very end the 30 largest. Is this right?


  20. Don’t get me wrong, but I do not think these information are really important. As one of the organizers, I do not have any confidential information about the data: all the information we have is available on this website; I simply have the whole dataset, so that I can make some stats on them. Anyway, I provide a few hints:
    a) no, it’s not. It more or less Gaussian (strictly speaking, it’s not).
    Range is 1630 – 107400 displays, mean 42600, sd is around 20000.
    The distribution is skewed towards 0.
    b) This is really not important (see below). Anyway, I checked and this is not true.
    c) To me, ids themselves are really not meaningful; they are just a way to identify entities, track them, and differentiate entities. In my code, I simply follow the flow of data and record new ids as they appear, along with the timestamp at which they appear for the first time. I guess that the ids at the beginning have appeared earlier than the first timestamp in the log; so, the thing has to warm-up to observe truly new ids. But we do not have any information about all that.

  21. I’m having trouble exporting the jar file. When I tried using build.xml, I got this error message:

    ../Documents/ExploChallenge/build.xml:21: restrict doesn’t support the nested “name” element.

    and when I tried using the export function in the package explorer, I got a bunch of error messages like these:

    Could not find source file attribute for: ‘../Documents/ExploChallenge/bin/exploChallenge/Main.class’
    Source name not found in a class file – exported all class files in ExploChallenge/bin/exploChallenge
    Resource is out of sync with the file system: ‘/ExploChallenge/bin/exploChallenge/Main.class’.

    Can anyone help me with this?


  22. In case someone runs into the same problem, I found a solution. My Eclipse built-in ant (version 1.7) is outdated, and all you need to do is to install ant 1.8.


  23. Hi, is there any chance numpy can be upgraded to a currentish version? I believe at some point I have had some weird issues where the old version of numpy has different behavior.

    Apologies for the inconvenience,

  24. Hi, maybe some Good Samaritan wants to discuss or help me to understand why or why not should I pay attention to the “age” of an article. I’ve read in the task description that it is very important, however I can’t see why… maybe I am obfuscated with the selection procedure ?

    • forget the selection procedure. think of it this way:

      1. given a context, each article has a different propensity of being clicked. your job is to select the article with highest chance of being clicked.

      2. for the same article, the chance that it will be clicked by a context c is higher earlier in its lifetime.

      So, now the task is to somehow model and tradeoff between “appropriateness” and “novelty” of an article, given a context.

      • but I have another interpretation.
        in time t ,a visitor A comes and read an new article B.
        in time t+n , a new visitor C comes and he may treate B as a new article too.Because visitor C had never read B before.
        the novelty of an article may not decrease among different visitor in a short time.

  25. Hello,

    I am getting this error:

    bash: ./go.sh: No such file or directory

    I changed the argument to YahooLogLineReader, but it doesn’t help. I am using the Python version.

    Please let me know if I am making a mistake somewhere… Thanks!


    • In your case (I had a look to your zip files) this is because you are not using the build_sumission.sh script.
      Your zip file wants to extract everything in a submission dir. The ./go.sh file must be extracted in the current directory. More precisely the command unzip yourfile.zip -d mydirectory must extract everything in the mydirectory with the go.sh file located at mydirectory/go.sh

  26. I think according to the current plan the best model of each team gets evaluated in the second pass of the competition.

    I would like to ask can we pick the algorithm ourselves? Due to my concerns, I think the best one on the leader-board might not be the best algorithm because of overfitting.


  27. Hi, i have a question about the end of the challange. In the information you say:

    “In phase 1, winners will be known at the beginning of June, these winners are strongly encouraged to present their work at the workshop.”
    So “who” will be the winners of phase 1? The first X persons? All Submissions above a threshold?

    “Phase 2 results will be known only at the workshop, it will be the same procedure of evaluation but with more (and new) data. Participants cannot submit any new algorithm, we will use their best submission of phase 1.”
    Will this new data has the same shape, or can for example the number of user-features vary?

    And many thanks for the great challange, it’s real fun :)

    • About the workshop we think it’s more about having interesting discussions, to stick to you description I would say X=3 :)
      But we are open minded and if somebody have something new an fun to present then it’s ok.

      About the second phase, it will be the same kind of data but maybe a very few features will be removed (I may have a bigger dataset in the next few days)

      • If the feature dimension of the phase 2 data is different then some submissions from phase 1 might not work.

        Well… At least mine won’t work. I hard coded the number dimension :(

        • Do not worry (in fact 2 dims could be missing) so your hard coded value should not be a problem.
          In all cases for all algorithms over “always last” in phase 1, I’ll pay attention to get them working on phase 2 data.

  28. I am noticing that random seed variations between runs can cause a swing of upto +/-10 points. since there nearly 10 people at the top within a difference of 20, this could be a significant effect. Do you have any suggestions for fixing this issue? Will the best submissions of each participant be run a number of times and the best/average score taken?

    • The final dataset is 4 times bigger so I expect to have lower variance. Anyway if the scores appears to be close, I’ll check with some t-tests

    • Hi Dr. exploreit.

      As I understand, for the first part of the challenge, there is no final round. The winner will be the one with higher score passed the deadline.

      For the second round I am not clear now how it will be. As I understand there will be a final round with more data as Jeremie explains.

  29. Hi Jeremie,

    Would we get a chance after June 2 to submit code for Phase 2 evaluation (taking care of feature removal, tuning constants etc) ? or would you just use our best submission file for phase 2 data? I did not understand when you said “I’ll pay attention to get them working on phase 2 data”.

    Also, what time exactly will the last submission be accepted tomorrow?

    • Precise time limit is Samoa Standard Time (that means as long as it is June 2th somewhere on earth you can submit).

      After the deadline you cannot submit, but you can indicate me your favorite old submission. This submission will replace your “best” one. About “paying attention” this is about not exclude a submission because of a trivial problem on Phase 2 datas.

    • Yeah, thank you very much for the great competition!!! :)

      Will you later publish the initial dataset, or the dataset from phase 2, so we can continue offline? Would be great ;)

      thanks again

Leave a Reply