April 1, 2014
Cracking the Location Code
The challenges of hitting a baseball are many and difficult. Depending on the speed of the pitch, a batter may have something like half a second to 1) locate the ball as it leaves the pitcher’s hand, 2) predict its movement based on the kind of pitch it is (fastball, slider, curve, etc.), 3) decide whether to swing, and potentially 4) adjust mid-swing to the path of the ball or check his swing. All of which is to say, hitting a baseball in MLB may actually be the hardest thing in the galaxy (I’ve never done it, myself).
Arguably the most demanding part of this battle is purely mental (as Hank Aaron noted). Because of how little time there is for a hitter to perform all of the above-mentioned tasks, it is helpful to have some notion ahead of time of what, where, and how the pitcher is going to throw. Conversely, the more uncertainty and confusion a pitcher can create in the hitter, the more chance he has of catching him off guard.
I’ve written about this topic before in the context of pitch type. In that study, I found that pitchers who threw more pitch types and mixed them more evenly were better able to get strikeouts. A lurking caveat in that initial analysis was that it ignored location, and location is important. I aim to fix that blind spot, at least partially, in this article. As before, I’ll quantify uncertainty using a measurement called entropy, and see how the entropy of location affects each pitcher’s outcomes. The greater the entropy of location, the harder it is for a batter to predict where the next pitch will be.
Location, Location, Location
Secondly, it’s not clear whether entropy in location should necessarily be a good thing. On the one hand, varying location gives the batter more to think about and could potentially cause confusion. On the other hand, there’s an inherent penalty to locating one’s pitches too unpredictably: walks. Simply pitching far outside of the strike zone for the sake of befuddling the batter is a losing strategy. To put it another way, unlike an eephus, a pitch four feet out of the zone isn’t fooling anybody.
With these caveats noted, let’s compute the entropy of location and see how it varies between pitchers. I’ll use the same dataset as before, limiting myself to starting pitchers. Interestingly, whereas before I noted a correlation between entropy and velocity, no such relationship exists between location entropy and velocity.
The league leaders in location entropy are an odd mix of pitchers good, bad, and mediocre: Clayton Kershaw (good) is there, but also Barry Zito (bad), and Felix Doubront (mediocre). At the opposite end of the spectrum, the group of pitchers with the least location entropy includes guys like Hisashi Iwakuma and Bartolo Colon, but also Aaron Harang. Both extremes are a little confusing, and if there’s a pattern, I can’t figure it out.
Location entropy is reasonably well-correlated with higher walk rates, consistent with the idea that it’s a product of wildness on the part of the pitcher (R2=.16). Consider this graph of location entropy vs. BB/9.
There’s a noticeable and evident trend toward more unpredictable pitchers producing a higher rate of free passes. But, perhaps interestingly, location entropy is also associated with a higher strikeout rate.
The end result of these countervailing trends on FIP is… nothing. They effectively seem to cancel out, so that overall location entropy has no significant effect on FIP.
Splitting the Zones
To define the strike zone, I considered the rate of strikes/pitch per each cell in the grid and divided the zone based on whether the probability of a strike was greater than .9 (strike) or less than .1 (ball). (As a side note, as much as PITCHf/x has made the umpire’s occasional mistakes stand out, it is staggering in some respects just how consistent they are: About 90 percent of the bins are either almost always strikes or almost always balls). I then computed the entropy for each set of bins independently. For each pitcher, I limited myself to at-bats against righties, so as to avoid the additional complexity of the lefty strike (I will return to handedness-specific splits in the future).
Here’s a plot of within-zone entropy vs. outside-zone entropy.
They are less correlated than I would have guessed (R2=.36). It appears that there is some ability of pitchers to locate pitches unpredictably outside the zone but hit their spots inside the zone as well. Intuitively, I predicted above that the best combination would be high in-zone entropy and low outside-zone entropy, but as is usually the case, the situation is more complex than that.
In fact, as we run through our three diagnostic characters (strikeouts, walks, and FIP), we see an odd and counterintuitive pattern emerge (stars denote correlations with significant p-values). Outside-zone entropy is associated with more walks, but it’s also associated with more strikeouts. In-zone entropy isn’t associated with more strikeouts, but it is powerfully affiliated with fewer walks. Neither category of entropy does much for FIP, at least not significantly so. And fastball velocity, taunting me with its consistency, causes many more strikeouts, slightly more walks, and much better FIP.
The Complexity of Location
Peering into location more finely by examining within- and outside-zone entropy, however, showed counter-intuitive patterns. Location entropy outside the zone was associated with both more walks and strikeouts, while inside the zone, location entropy caused a dramatic decrease in walks. Peering at the list of pitchers with high in-zone entropy confirms that it is measuring something like control: Guys like Cliff Lee, Madison Bumgarner, and Cole Hamels show up in the list of the five highest in-zone entropies, and down the list a bit, Clayton Kershaw makes an appearance.
Uncertainty in location outside the zone correlated with more walks and more strikeouts. More walks are understandable, because I can imagine outside-zone entropy as measuring wildness. The manner in which outside-zone entropy causes strikeouts is a little trickier. A clue might be that outside-zone entropy is significantly associated with more swinging strikes, suggesting that these pitchers are somehow better able to cause hitters to chase pitches.
Any entropy-based analysis of location grapples with the question of how a pitcher should optimally distribute their pitches. Is it best to spread them all about the zone, knowing that to do so requires the hitter to think more carefully about whether and where to swing? Or alternatively, is it best to pick a handful of spots, say, each of the four corners, and pitch predictably but competently to those locations? There have been successes with both strategies, and many others besides.
There’s no question that location is a trickier beast than pitch type, perhaps most demonstrably because there is so much more dimension and richness to location data than pitch type data. Location doesn’t come in eight flavors; it comes in thousands of points, barely distinguishable to the normal human’s eye but important to an MLB hitter. That hitters take not only location but also pitch type into account—all within the context of the count, the pitcher’s tendencies, the base-out situation, and more—speaks to the task that has been called the hardest thing in sports (and maybe the galaxy).