This is going to be a long post, so I’m going to front-load the top line results (with a little bit of history) and then get into a longer discussion of some of the details after that for anyone who is interested.
One of the earliest and arguably still most important discoveries in hockey statistics is that, at the team level, past goal scoring is not a very good predictor of future scoring. Early writers in hockey statistics discovered that you could do a better job of predicting how well teams would score at even strength by looking at shot attempts instead of goals. Over time many people began to argue that because there are differences in the quality of shots, improvements could be made by adjusting each shot for its likelihood of becoming a goal, based on factors like how close to the net the shot is and what type of shot was taken. We call this adjusted measure Expected Goals, or xG.
The first example of an xG model that I’m aware of was created by former Florida Panthers analyst Brian MacDonald back in 2012. Unfortunately, his research does not appear to be available any longer. [UPDATE: Since publishing this article, it’s come to my attention that statistics like expected goals go back to at least 2006, prior to the NHL first publishing shot location data!] [UPDATE 2: And an even older xG model from Alan Ryder in 2004.] It wasn’t until 2015 that an xG model gained wider public attention, when Dawson Sprigings (who now works for the Colorado Avalanche) and Asmae Toumi collaborated on a model for Hockey Graphs (for lack of a better name, I will refer to this as the DA model for the rest of this post). According to their article, expected goals are better at predicting future results than Corsi is. This was the breakthrough that many people had been waiting for, a metric that tried to account for the quality of shots rather than just their quantity.
In the years since then, a number of people have created their own xG models. While the raw data for the DA model is not publically available, the makers of more recent metrics have put the data online so that anyone can use it. While it is impossible to evaluate every model that’s out there, I collected data from three of the most commonly-used public models to do some new testing of these metrics. The data I’m using comes from Moneypuck, Evolving Hockey, and Natural Stat Trick.
Continue reading “Corsi Is Better At Predicting Future Goals Than Expected Goals Is”