Corsi vs xG, Part 2

Yesterday I wrote that Corsi is better at predicting future goals than expected goals are. This post is going to be a brief update on that and I’m going to get right to the point, so if you want more details about theory, methods, etc. I recommend reading the first post.

In that first post I tested how well you could predict a team’s goal ratio in the second half of the season based on the first half of the season, and I found that Corsi is more predictive than expected goals are, although the gap has been narrowing over time. Since the goal is to predict future games using past results, using a chronological ordering of games is the only method that makes sense.

However, you could argue that splitting the season in half chronologically might tend to understate true talent differences between teams because rosters change over the course of a season as players get traded or injured. On top of that, coaching changes can sometimes have a dramatic effect on how a roster perfoms. One thing we can do to try to account for that is to split the games in a way that ensures the two halves have comparable rosters.

So I’ve re-run the numbers I published yesterday using a different approach, which is to split each team’s games into even and odd halves. That way if a player is on the roster for part of the season, but not for another part, the two halves will each include that player for the same number of games (or at most one game more/less).

Let’s take a look at what happens if we do even/odd splits, using results from even numbered games to predict results in odd numbered games. These results are at 5v5 for every 82 game season since 2007, and I’m also including Natural Stat Trick’s “scoring chances” metric.

SiteSCF>GFxGF>GFCF>GF
NST0.190.170.21
EH0.150.21
MP0.170.22

As mentioned in the previous post, there are some questions about the quality of shot location data prior to 2009, so let’s also take a look at the numbers from 2009 onwards.

SiteSCF>GFxGF>GFCF>GF
NST0.210.190.21
EH0.170.22
MP0.180.22

Another thing I mentioned in the previous post is that Corsi’s edge has been declining over time. So what if we look only at games since the most recent CBA was signed, 2013? This covers the 6 most recent 82 game seasons.

SiteSCF>GFxGF>GFCF>GF
NST0.250.240.23
EH0.220.23
MP0.240.24

Based on all these numbers, I would say the conclusion of the original post remains unchanged. Over the long term, Corsi has proven to be a more predictive metric that expected goals, although the gap has narrowed over time to the point that the metrics perform very similarly over the past 6 seasons. Scoring chances remain the most predictive stat for the most recent seasons, although not by a huge margin.