October 18, 2016
Getting to the Bottom of the Barrel
The fruit is beginning to ripen on the StatCast vine. Today’s secret word is “barrel.” Not the wooden vessel suitable for storing wine, but a well-struck batted ball. Think, “he barreled that one up.” The nice thing about StatCast is that we now have data on how hard a player actually hit a ball, which solves for one of the great laments of batting stats throughout history. Sometimes you hit the ball dead on the nose and the shortstop makes an amazing catch. The batter did everything right and, for his efforts, he's now 0-for-1. (Worse, the guy after you dinks a little dying quail that just happens to go over the second baseman’s head and gets a hit.)
Until Statcast, we didn’t have systematic (public) data on how hard a batter hit the ball. Now, we know how fast the ball was going when it left the bat and at what angle the ball was “launched.” That gives us the un-biased look we had hoped for and maybe a way to give some credit back to the guys who hit it hard, but right at someone. "Barrels" are a first swipe at trying to quantify that credit. We can give credit to a player for hitting the ball in an advantageous way, without considering whether he was just unlucky to hit it right at someone. Sounds like a good deal so far.
But we need to look deeply into this barrel to see what’s at the bottom of it. Is it actually doing what it says on the label?
Warning! Gory Mathematical Details Ahead!
A barrel is supposed to represent a subset of batted balls into fair territory defined by their combination of exit velocity and launch angle as producing a batting average of at least .500 and a slugging percentage of at least 1.500. (So balls hit at X mph off the bat and launched a Y degree angle have an expected outcome of ...) That makes sense as a starting point. We’re looking for the types of balls that normally produce extra-base hits. (A double, for example, has a “slugging percentage” of 2.000). Through some testing, MLBAM data scientist Tom Tango has derived a mathematical formula that picks these “barrels” out.
At first glance, barrels are doing exactly what they say on the label. Using data from 2015 and 2016, we can see that 78.9 percent of barrels went for some sort of extra-base hit (either a double, triple, or home run) compared to only 7.4 percent of non-barrels. And 80.8 percent of barrels were hits, compared to 29.9 percent of non-barrels. So, we’ve certainly identified a group of balls that are producing (in general) high-value outcomes.
Before we get carried away, though, it’s instructive to look at those numbers the other way around. For example, if we know that a batted ball turned into some sort of on-base event, the chances that it started its life as a barrel are 12.9 percent. Of extra-base hits, only 36.9 percent started out as barrels. In the same way that all squares are rectangles, but not all rectangles are squares, most barrels do end up producing base hits (and extra-base hits, at that), but not all extra-base hits come from barrels.
I also found some interesting things when I looked deeper into those extra-base hits. About one-fifth of barrels in the data set (20.1 percent) ended up as doubles/triples, but more than half ended up as home runs (58.8 percent). Barrels tend to disproportionately favor swings that leave the ballpark. Indeed, again running those analyses backwards, only 14.8 percent of doubles and triples started their lives as barrels while 75.3 percent(!) of home runs began as barrels.
The correlation between the raw number of barrels that a batter had in a season (minimum 100 balls hit fair) and the number of home runs that he hit in that season--remember, we only have data for the past two seasons--is .93. The correlation with the number of doubles/triples is a somewhat less impressive (though still moderately strong) .58.
The correlation between the percentage of barrels that a batter had among his batted-ball events (again, 100 such events minimum) and the percentage of his batted-ball events that left the park was .90. The correlation between barrel percentage and double/triple percentage was only .27.
We also have data on how far the ball traveled in the StatCast feed, and according to what data we have available, the average barrel traveled 390 feet, with a standard deviation of only 44.5 feet. That means that a barrel is almost always a deep fly, and if you hit the ball a really long way, good things usually happen. Usually, the good thing is a home run. Presumably, some of those barrel shots are held in the park by wind or hitting the wall or hitting it to dead center and they end up as “only” doubles, which isn’t a bad consolation prize. In fact, we can indirectly test this hypothesis.
I created a very simple park effect, looking at all the barrels hit in each park by the visiting team and how often those resulted in a home run. For example, in 2016, 72 percent of barrels hit by visitors in Yankee Stadium were home runs, so a barrel hit in Yankee Stadium that year should have an expected HR potential of 0.72. (A much smaller, but non-zero percentage of non-barrels also left the yard.)
For all players with at least 100 balls in play in the season in question, I tallied up the raw number of barrels that they hit, pro-rated them based on park expectation, and got an “expected” home run total. That expected total correlated with the actual number of home runs that the player hit with a coefficient of .986. When I did the calculations on a per-batted-ball rate, the correlation was .990. Barrels seem to have such a high value because they are almost entirely “home run swings.” Stats derived from barrels are mostly just park-adjusted/neutralized home run rates.
That’s not a bad thing to measure, but it's only part of the equation of being a hitter. We’re not really getting hard line drives, bleed through seeing-eye singles, or the ability to hit gaps with barrels. Still, if barrels can show that they have advantages over other, more traditional measures of power, we can probably find something for them to do. And here, barrels have some redeeming qualities.
For one, barrels (for hitters) tend to stabilize more quickly than do outcome-based stats. We’ve seen before that exit velocity (one of the ingredients in the barrel recipe) tends to stabilize fairly quickly for hitters. “Barrel rate” (barrels/balls hit fair) is similarly quick to stabilize. Using the Kuder-Richardson 21 formula, “barrel rate” actually hit a reliability of .70 within a sample of 50 batted balls. As a point of comparison, HR/balls hit fair did so around 100 batted balls.
In addition, I looked at whether barrels had any predictive power. For all hitters with enough qualifying batted balls, I took his barrel rate (barrels/batted balls) and his HR rate (HR/batted balls) over the previous 100 times he'd hit the ball into play. I looked at each as a predictor of whether this next batted ball would be a home run. Barrels were the better predictor. This held when I reduced the window to 50 or 30 balls in play.
However, when I looked to see whether barrels would predict doubles and triples similarly, doubles and triple rate over the last 100 balls in play was the better predictor over barrels during that same time period. One way to get doubles and triples is to hit deep fly balls that don’t quite make it out of the park, but this doesn’t seem to be the only way.
For pitchers, the story is a little different. Of course, if a batter barrels one up, there was, by definition, a pitcher who surrendered that barrel. What does that say about the pitcher? Much like batters, the correlation between barrel rate and home run rate (r = .73) is stronger than the correlation between barrels and doubles/triples (r = .20).
However, for pitchers, in predicting whether the next ball would be a home run, using home run rate over the last 100 balls in play was a better predictor than were barrels over the last 100 balls in play. Same with doubles and triples. One reason might be that barrels aren’t very quick to stabilize for pitchers. Again, we’ve seen previously that for pitchers, exit velocity doesn’t really stabilize. In fact, at a certain point, adding more observations to the sample actually makes the reliability go down, suggesting that there comes a point where older data stops being predictive and just makes the model worse.
In this case, we see a similar pattern, although not as drastic as with pure exit velocity. Reliability peaked for barrel rate for pitchers around 400 balls in play with a Kuder-Richardson coefficient of .58 (I usually use .70 as my cutoff for “stable”). After that, the KR coefficient begins going down. That means, while we can be fairly confident that a batter’s barrel rate over even a small sample is representative of his true talent over the time that sample was collected, for pitcher’s it’s harder to tell. While it’s not a good idea for a pitcher to give up a big fly, sometimes it just happens.
We know from previous research that “the pitcher gives up the fly ball, but the batter hits the home run” and that HR/FB rates tend to regress back toward the mean for pitchers, and this suggests that even giving up a “barrel” is not as in the pitcher’s control as we might have initially thought. It’s not to say that reliability is so low that it’s all random. It’s just that the reliability of the barrel stat for pitchers isn’t quite as high as we might hope it would be to feel really confident in saying that “the number” represents “the man.”
That’s One Giant Hit for a Man, One Small Step for Understanding Baseball
This is progress, even if a bit halting. Barrels are an incremental improvement in our understanding of baseball because they measure inputs rather than outcomes. In this particular case, they are only measuring one type of struck ball, which is basically a big fly ball. The fact that barrels correlate so well with park-neutralized home runs suggests that they are measuring the same thing. However, barrels gives a more accurate reading on a player more quickly than something like home run rate does. It’s a slightly better measure to evaluate a player in a smaller sample and it correlates nicely to an outcome that everyone agrees is important.
I don’t know that we’ve learned anything more than “hitting the ball a long way is good and some guys are really good at that particular skill,” but the measure that we’ve created for it is demonstrably better than what came before it. So, I say to reach into the barrel and have some of that wine in celebration.
Barrels do have their downside in that relying only on barrels alone to judge a hitter risks losing sight of players who make their hay in other ways. Doubles merchants/gap hitters in particular, long overlooked, are likely to continue to be hidden from these barrel lists, but I doubt that this is the last stat to be developed from the StatCast data set. There will be more new toys to play with soon. But please do me a favor and resist the temptation to believe that barrels are an end unto themselves and to sort hitters by barrels or barrel rate and make a list of the “best hitters in baseball” based on that alone or to judge an at-bat as good based entirely on whether it resulted in a “barrel.” Please? There’s a lot more to come.
We also see, though, that barrels are only moderately good at giving us insight into a particular pitcher, so we want to be careful there. Barrel rate was also less predictive of home runs than were previous home run rate over the same time period. The conclusion that we should draw is that a barrel tells us a lot more about the underlying talent of a batter than a pitcher.
So, all told, that crazy stalking system that’s watching every move that you make on a baseball field might actually be useful for something. This is gonna be fun.