On the NBA: Statistical Responsibility

Q: When can we expect Moneyball 2: Moreyball?

A: this would be a terrible movie. that said, we did have a 22 game winning streak and apparently a long winning streak is all you need for a movie. 

– Daryl Morey, during a Reddit AMA chat.

Coverage of the NBA in the modern age is dominated by statistics. Where the dominating monolith of the box score once stood, now there are a myriad of useful and interesting methods for analyzing basketball, in scopes from the play of an individual player in a specific play type to the historical strategic choices of the league as a whole across time. This is a tremendous boon to anyone who wants to talk about the game: come up with an argument, and you can go away and find the numbers that prove or disprove it. But it’s not all good news. 47% of statistics are made up. Lies, damn lies and statistics! You know the clichés. The more information there is, the more it can be twisted to greater accentuate your argument or to be downright misleading.  There is an excellent example doing the rounds at the moment, and it has been driving me nuts to the extent that I felt like writing a blog post about it: The Heat are 45-3 in their last 48. How can we expect them to lose four in a row to anybody? 

There are some classes of statistics that create brilliant, fragile curios – great for soundbites, but not particularly useful for providing an accurate picture of what’s actually going on on the court. One factor that’s particularly relevant in such cases: when the sample from which the numbers are taken in such a way that it twists the result. What we have here is a textbook example. On what basis have we chosen 48 games as the period of time over which the past form is to be considered? Pretty sure the answer is “Because that’s when the Heat started winning“. But this is cherrypicking of the worst sort. A win streak is emphatic, but by its very nature it implies that the areas at either end of the streak are dotted with losses. Ironically, the last game before this selection begins was a loss against the team the Heat happen to be taking the court against tonight – the Indiana Pacers. In using this stat as a tool to prop up an argument about how the Heat are going to win, the implication is that the game on Friday 1st February that the Pacers won 102-89 is somehow much less important than the games that followed it. Otherwise why wouldn’t it be in the sample?

Of course, there is a very easy to spot counter-argument for why this sort of factoid isn’t all that useful – last year’s playoffs! Conveniently, 2012 featured the exact same storyline, as the Spurs had amassed a 20 game win-streak two games into their Western Conference Finals match-up with Oklahoma City. As we saw then, the past record of the Spurs did not matter in the slightest at the Thunder proceeded to reel off four straight games and progress to the NBA Finals proper. The only things the Heat’s current achievement has over last years’ Spurs are that it’s a bigger sample size and that it doesn’t have an event to terminate it with yet. But when the samples have been selected to accentuate statistics, I’m not sure they can be treated with the same respect they normally would be.

There’s one other thing worth bearing in mind when talking about numbers like these, and that’s strength of opponent. If I’ve done my maths right, the Heat’s last 48 games have featured opponents with a combined winning percentage of 42.7%. To put that in perspective, the closest comparison to that record this season would be the Philadelphia 76ers or Toronto Raptors, neither of whom were particularly strong. There were certainly some quality wins for LeBron and co. in the section of the schedule in question, but there was also a large number of games against the lottery-bound who were unlikely to put up much resistance. The remaining playoff teams excluding Miami have a 66.2% regular season winning percentage and should provide much stiffer opposition.

At the end of the day, the Miami Heat are still an excellent basketball team and are rightly considered favourites to win it all. But there are plenty of arguments and statistics to support that fact that aren’t based on horribly mangled statistics. Just as we quested for better ways of looking at the world than points per game, so should we choose firmer ground to base our positions on than selectively defined sample spaces. Isn’t the Heat’s full regular season record of 66-16 impressive enough? Or perhaps instead their 30-12 record against playoff teams? Neither is quite the statistical canard of a 45-3 stretch, but you don’t have to distort the truth to make Miami look like a good team.

With all the information available in today’s NBA, it’s more important than ever that the we exercise judgement in the tools we use to interpret what we see. There is a time and a place for statistics that produce ‘interesting’ results, but they almost always come with caveats and biases that make them unsuitable for any rigourous statistical analysis. And if you’re using numbers to educate your basketball perspectives, then that is what you’re doing. So just as you should be getting rid of points per game when talking about offensive performance, so should you be getting rid of streaks when talking about how good a team is.

in columns
Follow Red94 for occasional rants, musings, and all new post updates
Read previous post:
Grading Carlos Delfino

Carlos Delfino joined the 2012-2013 Houston Rockets as the quintessential Morey acquisition. Daryl Morey, general manager and analytics hero, loves...