≡ Menu

A Treatise on data and steals

I think I’ve read this about a dozen times by now. I love it for the simple reason that it does exactly what analyzing data is supposed to–allow us to see what we would otherwise be unable or unwilling to see. The author also had the guts to break out some bigger statistical guns and use multiple regressions, a technique I’ve also used before.

As the nerdy stat head data junkie of this blog, I want to take this opportunity to really emphasize the most fundamental premise of data-based analysis. Data is non-partisan. People are not.  When you hear the terms data-driven or data-based decision-making, that means the data comes before the decision, not vice versa. The worst, and unfortunately most common, mistake when working with data is to use data analysis to reinforce an already formed conclusion, or at least a very desired conclusion. In such a scenario, the person who thinks his/her opinion is slighted simply dismisses the results of the data analysis as opposed to rethinks his/her original conclusion.

And that makes sense. If you poured $20 million dollars into a program to improve urban poverty, but the data say your program has not improved anything, it’s hard to simply accept that. It’s much easier to call the data analysis baloney and keep believing that your program is doing well. But we can all see why such a decision would be dangerous and potentially financially calamitous. The same thing goes for basketball. If you refuse to buy into data simply because you disagree with the results, are you doing yourself (or others if you’re in such a position) any favors? Do you have the balls to eschew your pre-formed opinions and allow the data to inform new ones? It’s not easy, but if we can’t or refuse to then there is absolutely no need to do any more data analysis on anything.

Onto steals.

First, because we are good consumers of data and don’t just dismiss analysis because we don’t like the results, we’re going to assume that Benjamin Morris’s results are statistically accurate. Where can we go from here? The two most common “fair game” criticisms of data analysis are the data itself (is it accurate?) and the analysis methods (are they appropriate?). I’m going to assume that the data is accurate. Since he didn’t publish his methods, I can’t comment on them. What I can do is take his brilliant ideas and go in a separate direction with them.

Benjamin wanted to know the impact of individual box scores on team production. I don’t have the patience to grab that much data, so I’m going to look at the impact of team box score stats on team production. Since I’ve already established that I’m impatient, I purposefully sampled box score stats from five teams, the Houston Rockets, Indiana Pacers, Miami Heat, OKC Thunder, and Chicago Bulls. I selected the Rockets because I write for this blog. I selected the other four teams because they’re successful but play very different styles of basketball.

Then I did the exact thing that Benjamin did, but on a team level. I ran a regression using a bunch of box score stats to see how each stat predicted team performance, when all other stats were held equal. I also controlled for strength of opponent. Here are the results.

Graph

How many team or opponent points a unit increase of each box score stat is worth

I broke up team productivity into opponent points (team defense) and team points (team offense). To read this, let’s take rebounds. Each additional team rebound is worth 0.249 additional points for a player’s team, and takes away -0.358 points from the opponent’s team. I didn’t include assists because they automatically result in points, which kind of defeats the purpose of prediction.

As you can see, the results support Benjamin’s findings (love when this happens!). Steals are monstrously valuable and equate to more team points than any other box score stat. Surprisingly, steals don’t predict anything on defense (neither do turnovers). So despite being a defensive statistic, steals are much more valuable offensively. Also surprising, at least to me, is that fouls actually predict more team points. My interpretation is that aggressive play that results in steals probably also results in fouls. So both more fouls and more steals might predict easy fast break points, though fouling is a wash because an extra foul predicts 0.583 extra points for the opponent.

One methodological criticism about the steals analysis I’ve noticed, both on the forums here and on FiveThirtyEight, is that there are fewer steals in a game than, say, rebounds, and therefore it’s not really a fair comparison. While the phrasing isn’t technically correct, the general sentiment of this criticism is. Think of it this way. You want to predict your happiness. Your stats are beers and cars. It won’t surprise you that an extra car makes you a lot happier than an extra beer, but that’s not really fair because you can get another beer a lot more easily than you can get another car. What we need to do is standardize the stats.

That’s what I did next. Instead of units (e.g., one extra rebound or steal) and points, I converted everything into standard deviations. Here are the results.

Graph 2

The same chart, but with standardized values

The way to read this is as such, using rebounds as an example. An increase in one standard deviation of rebounds predicts a decrease of 0.217 standard deviations of opponent points, and an increase of 0.149 standard deviations of team points. That sounds like gibberish, and it is. What’s valuable about this is that it allows us to compare stats without worrying about the beer/car problem, since standard deviations are equivalently sized units. From these results, we see that steals are still the strongest single predictor of team or opponent points. However, the difference between steals and other categories is smaller. Compared to a rebound, it is only about twice as valuable, as opposed to roughly five times as valuable in the previous chart. Rebounds do have the added benefit of taking away opponent points, so from the perspective of net points a rebound is actually more valuable than a steal.

View this discussion from the forum.

in essays

{ 0 comments… add one }

Login to leave a comment.
Total comments: 11
  • shirtless says 6 months ago

    Giving it another thought, I would argue that data analyzed should be steal attempts and not steals made. Going for a steal can result in a successful steal and an offensive advantage, but many times the defender fails to steal leaves the defense in disadvantage. Focusing only on the made steals is somewhat similar to using only made field goals, which I think everyone agrees is a bad statistic to measure either team or player efficiency.

    It's not correlating with offensive efficiency, it's predicting points. Although efficiency measures are part of the model to control of strength of opponent.

    If steals were costly on the defensive end due to gambling, that would be reflected in opponents points predicted. Since there is no relationship between steals and opponents points, my conclusion is that points allowed due to gambling are offset by points taken away with successful steals.

  • redfaithful says 6 months ago

    Giving it another thought, I would argue that data analyzed should be steal attempts and not steals made. Going for a steal can result in a successful steal and an offensive advantage, but many times the defender fails to steal leaves the defense in disadvantage. Focusing only on the made steals is somewhat similar to using only made field goals, which I think everyone agrees is a bad statistic to measure either team or player efficiency.

  • Sir Thursday says 6 months ago

    The question I had about the original article is it is unproven that steals actually correlate to anything meaningful. Steals are not positively correlated in any meaningful way with either wins or defensive rating. If so is it not just a meaningless box score stat?

    Well, if it's correlated with Offensive rating (as the article seems to imply) then there is some benefit to them. There certainly is a class of steal that leads to two easy points (although in a box score it's difficult to separate them out from run of the mill steals that don't lead to fast break opportunities), and that's presumably where the idea for the article came from. I think the original article over-estimated their value, but Richard seems to have done a better job of putting them in context.

    ST

  • NorEastern says 6 months ago

    The question I had about the original article is it is unproven that steals actually correlate to anything meaningful. Steals are not positively correlated in any meaningful way with either wins or defensive rating. If so is it not just a meaningless box score stat?

  • redfaithful says 6 months ago

    Ouch, big mistake on my side, indeed you can't sum them...

  • Sir Thursday says 6 months ago

    Now that I've read through the whole post, I like what you've done here in eliminating some of the sources of error in the original FiveThirtyEight post. It's a good read :). Sorry I jumped on it so quickly before.

    A couple of possible explanations for some of the results:

    • With the rebounding regressions, the more shots an opponent misses (and therefore the worse their night is going offensively) the more rebounds are available. So there's likely to be quite a bit of correlation between a team doing well defensively and grabbing a lot of rebounds.
    • My hypothesis for the positive correlation between fouls and points scored is that more fouls is a sign of a tightly reffed game. You may have committed more fouls, but assuming fair reffing the other team probably will have committed more too. In that sort of contest, there are likely to be a lot of free-throws, and those have the effect of boosting offensive efficiency. That's how you end up with a positive correlation between fouls and points scored.

    In contrast to the original work, this article contains less enthusiasm and more data. And it works much better for me.

    I think the emphasis should be on the last chart with the standardized values. If you look at the total (own + opponent) you get:

    +.366 for rebound,

    +.306 for steal,

    +.182 for a block,

    -.002 for foul and

    -.146 for turnover.

    These are the most relevant numbers in the two articles.

    Excellent work as usual Richard!

    Unfortunately because the printed values are standard deviations rather than averages, technically the values you get from totalling the offensive and defensive impact with a straight sum is meaningless. But I agree with the general sentiment that both rebounds and steals seem more valuable than the other stats (although you have to bear in mind the explanation above for rebounding).

    ST

  • TeamBall says 6 months ago

    Does this mean that the perception of attempted steals (resulting in defense breakdowns) tend to give up more points is incorrect?

  • redfaithful says 6 months ago

    In contrast to the original work, this article contains less enthusiasm and more data. And it works much better for me.

    I think the emphasis should be on the last chart with the standardized values. If you look at the total (own + opponent) you get:

    +.366 for rebound,

    +.306 for steal,

    +.182 for a block,

    -.002 for foul and

    -.146 for turnover.

    These are the most relevant numbers in the two articles.

    Excellent work as usual Richard!

  • Sir Thursday says 6 months ago

    Ha, sorry, I just saw that bit and the red stats-mist descended :P. I'll read the rest of your post now :).

    ST

  • shirtless says 6 months ago

    I used the wrong word. When I said true, I didn't mean his results are the only possible reflection of what is happening in real life. I mean that his results are statistically accurate given what he did. And if we want to question what he did, we shouldn't question his results, but should question his data or his methods (mentioned in my post). I've changed the language to make it more representative of my intent. Thanks for pointing it out.

    I actually referenced one of your criticisms in the post. To be fair to Benjamin, he never listed his complete methods, so we don't know exactly what he did. That's why I don't criticize his methods in my post.

    I also addressed your comment about the problem of the marginal unit in my post.

  • Sir Thursday says 6 months ago

    I'm sorry Richard but your statement that "because we are good consumers of data and don't just dismiss analysis because we don't like the results, we're going to assume that Benjamin Morris's results are true" is abhorrent to me. Accepting statistics blindly is just as bad if not worse than dismissing them outright because you don't like them. Before accepting any statistical argument as convincing you should be checking for flaws and making sure there is no structural argument to the methodology behind the analysis.

    I've listed some of the potential flaws to this study here. Since then, it has occurred to me that using marginal stats instead of ratios is probably a bad idea, especially when calculating statistics like 'replace-ability'. If your team averages 100ppg and 8spg, it is trivially easier to replace 1% of your points than 12.5% of your steals.

    You're going to have to convince me that the methodology is sufficiently sound to base any analysis on before I read the rest of your post, I'm afraid ;).

    ST

Leave a Comment