In this post we are going to continue with the clustering problem that we started in February. The idea remains the same, we are going to try to automatically group riders, but we changed our approach considerably. In the first step we identify four rider clusters: time trialists, sprinters, GC guys/climbers and classics specialists. After that we zoom in on the sprint cluster and the clustering algorithm comes up with three distinct sprinter types.
Today we address a complex, but fun and interesting problem: clustering riders. This is by far a new idea, everybody knows Elia Viviani and Dylan Groenewegen are classified as ‘sprinters’, whereas Chris and Froome are ‘GC guys’. The label a rider receives is simply based on past results. We are going to investigate whether we can take this a step further by (1) using a uniform point system for race results and (2) including other dimensions than just outcomes.