Cycling Prediction Review


Perhaps long overdue, but here we will review all predictions we published since the beginning of the year up to the Amstel Gold Race. We distinguish two types of races: one day races (ODR) / classics and stage races / multiday races (MDR). The predictions for these two race types are generated with different models, but for each race type the model specification was the same for all predictions up to now.

Zweeler predictions vs other predictions

All published predictions targeted the Zweeler poule games. For the multiday races this means that this is really something different compared to e.g. predictions of the general classification. In the Zweeler games you score points each time a rider from your team ends up in the top 20 of a stage. The first place is good for 35 points, the 20th spot yields just 1 point. Points scored as overall winner (separate prediction) are not included. A rider that scores very poorly in the overall GC can do very well in a Zweeler poule game by ranking very well in quite some stages, but losing a lot of time in some other stage.

The single game Zweeler games have a slightly different point system. For single day races selecting the correct winner in your team gives 120 points and numbers 2 up to 5 get 100, 90, 80 or 70 points, whereas a 25th spot in the classification still gives 1 point. Also for the single day races the Zweeler predictions do not fully correspond to forecasting the winner of a race. This is easily illustrated with an example of two riders:

  • Rider 1: A true all or nothing rider, a 20% probability of winning and a 80% probability of ending up rank 6 or lower.
  • Rider 2: A more careful rider, 10% probability of ending up in either spot 1 up to 5 and a 50% probability of a result outside the top 5.

For simplicity we assume that a rank of 6 or lower does not yield any points. We can then calculate the expected number of Zweeler points per rider as follows:

  • Rider 1: 0.2 x 120 + 0.8 x 0 = 24 points
  • Rider 2: 0.1 x 120 + 0.1 x 90 + 0.1 x 80 + 0.1 x 70 + 0.1 x 60 + 0.5 x 0 = 63 points

So if you are asked to just forecast the winner it’s better to select rider 1. However, if you need to create a Zweeler team it is better to select rider 2 that is expected to earn way more points. Despite the fact that predicting Zweeler points and the probability of winning is different, riders with high probabilities of doing well will end up high in both types of predictions. Perhaps not in the order you expect. One of the reasons is that the scored points quickly decrease in rank (e.g. rank number 12 scores 20 points compared to 120 points for the winner).


There are different types of Zweeler games. In the ‘default’ game you can select up to X riders without restrictions. The other game types are ‘letter’ and ‘budget’. With the ‘letter’ games riders are divided in groups A, B, C, etc. Within each group you are allowed to select a certain amount of riders. The final game type is ‘budget’. In these games you get a monetary budget to spend on a team of a certain size where each rider has his own price.

For each of the game types we also need to construct a procedure to actually pick a team. For the ‘default’ game this is simple. We simply select the top X riders from the prediction. For the ‘letters’ game the procedure is almost the same. Only now we pick the top X riders within each letter group. The team building procedure for the ‘budget’ games is most complex. In the end this is an optimization problem so we solve for the maximum number of expected Zweeler points when selecting a team.

Evaluation metrics

With all that said we are almost ready for the predictions results in Table 1, Table 2 and Table 3. Especially Table 3 is quite big and can be overwhelming without a proper introduction of the columns present, so let’s first check out the variables we will use later.

  • length: The race length, either MDR (multiday race) or ODR (one day race).
  • game: The type of Zweeler game. This is either a ‘default’, ‘budget’ or ‘letter’ game. These game types are explained in the budgets section
  • mp ok: This means most points OK. The value is yes / no and indicates whether the rider that was expected to earn most Zweeler points actually did.
  • mp top3: Indicator (yes/no) whether the rider that had most predicted Zweeler points ended up in the top 3 riders with most Zweeler points.
  • top5, top10 and top25: The number of riders in the top 5, 10 or 25 of Zweeler points that actually ended up in the top 5, 10 or 25.
  • noFC: The number of riders in the actual top 25 that were not included in the prediction. The main reason for this is the usage of non-final start lists.
  • score: A combined rank score for all aforementioned columns. The maximum score is 100 and is calculated as follows:
    • 20 points if the rider with most points is predicted correctly (max 20 points)
    • 10 points for each rider correctly within rank 2 to 5 (max 40 points).
    • 5 points for each rider correctly within rank 6 up to 10 (max 25 point).
    • 1 point for each rider correctly within rank 11 up to 25 (max 15 points).
  • zwpts: The number of points scored by the predicted team.
  • ZWscore: The Zweeler points as a percentage of the number of points scored by the 15th team in the payout ranking. We choose the 15th team because from this point you kind of start to get ‘in the money’.

It’s important to note that the general score is arbitrary, and in general very high scores will be hard to achieve. Missing out on the prediction of the rider with most points already leads to a loss of 20 points. However, the score is useful to compare the different predictions cross-sectionally and through time.

Prediction evaluation

An overview of average prediction metrics for all races and five subsets of races are presented in Table 1. The subsets for which we calculate average metrics are MDR, ODR (multiday and one day races), and the different game types budget, default and letter. The results in Table 1 help a lot in quickly assessing prediction quality. All details can be found in Table 3, but we will not be discussing these individual predictions separately.

Over all 29 races the Zweeler score is 63% and the general score is 35%. Differences between the Zweeler predictions can be considereable. The lowest Zweeler score is 8.2, whereas the highest score is 96.8. From the ‘all’ column we also observe that:

  • In almost 18% of the prediction the rider with most points was predicted correctly.
  • In about 40% of the predicted games the rider with most points was in the predicted top 3.
  • Of the top 25 riders with most points the predictions countain about 12 riders.

When we compare the 10 MDR predictions with the 19 ODR predictions we see the following:

  • MDR predictions are better than ODR predictions, with Zweeler scores of 76 versus 58.
  • For MDR races 15 of the top 25 riders are predicted correctly, for ODR this is 10.

It is not that surprising that the MDR predictions are better. Due to the multiple race days the result is less likely to be affected by luck/incidents/crashes.

The last three columns of Table 1 contain the average results per game type. Since the default game only has 4 observations we will leave it out of the discussion. The budget games score, with 52 Zweeler points very poorly compared to the 75 Zweeler points of the letter games. Also on all other metrics the letter games are more successful. In order to figure out how this is possible we will take out one of the worst predictions so far: Gent-Wevelgem, with a Zweeler score of 8.2. Yeah, that’s indeed a pretty disappointing score… Let’s see how it went wrong.

We start by looking at the original prediction in Table 2. The column ‘points’ provides the number of predicted Zweeler points, whereas the ‘% chosen’ column contains the percentage of Zweeler participants that selected the specific rider. For this we only used the top 15 most popular riders, hence some guys don’t have a value in this column. The ‘actpts’ column is the number of Zweeler points actually scored, ‘value’ is the budget value of the rider and ‘chosen’ has a value 1 if the optimization algorithm selected the rider. The last column, ‘scpts’ is the number of points scored fo the selected riders.

The Gent-Wevelgem game budget was 125 million for 13 riders. So the chosen column was created by solving for the maximum number of Zweeler points under these restrictions. By looking at this table there are clearly two reasons why this prediction was so poor:

  1. The riders with most expected points actually barely scored. For example, Sagan, Viviani and Van Avermaet together scored only 13 points.
  2. Differences in the expected number of points for e.g. Kristoff and Gaviria were small. Gaviria got selected instead of Kristoff and that was quite a pity.

Point 2 is hard to resolve, but we now only publish point forecasts for the Zweeler points, without any measure of variance. With a variance measure it would be possible to perform some form of mean variance optimization that will put more emphasis on riders that are more certain to score.

Conclusion and further plans

Most of the 2019 Zweeler predictions so far are can I guess be called ‘reasonable’. There are some pretty good and very bad ones too. However, it is clear there is still a lot of work to do, especially for the one day races. The biggest gains can for now be obtained by enriching the feature space. Examples of this are (1) explicitly accounting for cobbled classics by e.g. the number of cobbles kilometers or (2) including features from crosses. With the cross results Wout van Aert and Matthieu van der Poel could have picked up earlier by the underlying algorithms.

Although the predictions may be ‘reasonable’ keep in mind that there are some factors that we do not account for. These are:

  • Substitutes: we only account for points scored by the initially selected team, we don’t count points for replacements of possibly injured riders.
  • We do not account for the points of extra questions.
  • Sometimes the predictions are based on a non-final startlist. Riders that are in the prediction but in the end don’t race don’t score any points!

Besides the model improvements there are plenty of other plans. First, we would like to predict not only the Zweeler poules, but also simply outright winners. Another thing on the agenda is predicting individual races of stage races. Hopefully we manage to implement this before the start of the Giro in a few weeks.

If you have other suggestions don’t hesitate to send us a message on Twitter or email.