BP Comment Quick Links
![]() | |
May 16, 2013 The Stats Go Marching InCatcher Framing Before PITCHf/xAnalysis of framing has intensified over the past couple of years, with Joe Maddon talking about it on the radio and (via Ben Lindbergh) Clubhouse Confidential and MLB Network’s Diamond Demo series featuring discussions of the issue with guests like Jonathan Lucroy. Ben has been running a weekly column on the subject since the start of the season: in the first installment (as well as this piece for Grantland) he provided some background on the research so far, so you’re invited to have a look at that article before you read the rest of this one. Framing evaluation is one of those research subjects that has been made possible by PITCHf/x data, which means that we’re now into the sixth full season for which catcher framing can be measured. However, for quite some time, I’ve been thinking about this: if one could get a good approximation of the framing numbers just using Retrosheet pitch sequences, 20 years of catcher framing could be added to the discussion. When Ben jogged my memory recently, I decided it was time to stop thinking about it and start doing some number-crunching. The Method For each plate appearance, I counted the number of pitches not featuring a swing by the batter (basically balls and called strikes), with the useful Chadwick Tools saving me a lot of time and work. In the original model I created with PITCHf/x data, in addition to using the location coordinates as measured by the camera system and the pitch type as classified by the MLBAM algorithm, I controlled for the effect of the ball/strike count, the home plate umpire, the pitcher, and the batter—plus, obviously, the catcher. Since that model requires a lot of computing time, in order to update my numbers once in a while, I switched to a simpler but quicker model in which the pitcher and the batter are not accounted for. In fact, once the location and pitch type are factored in, the batter has very little effect on the call by the umpire (mostly due to his stance and proximity to the plate, I suppose). The effect of the pitcher is also reduced, and I decided that the tradeoff between accuracy and computing time was worth the exclusion. However, with Retrosheet data, we have no information on pitch location and type, so throwing the pitcher and the batter back into the model was necessary. In short, for every plate appearance I have the percentage of strikes on pitches not swung at as the outcome variable and the four actors involved (pitcher, catcher, umpire, batter) as the predictors. As I have done many other times in my baseball analysis, I have used a Cross-Classified Multilevel Mixed Model, which for saber-oriented people I’ll call WOWY-on-steroids. Note that when using PITCHf/x data, an extra strike is more or less attributable to something framing-related, being it a good reception by the catcher, the pitcher hitting the target, or the umpire being deceived (or, more likely, a combination of the three). However, when no information is available about location, several other factors come into play: among the called strikes are, for example, pitches thrown right down Broadway that may have not been swung at because of the batter’s tendencies (partly accounted for as the batter is in the model) or because great sequencing has fooled the batter. Thus, this version of framing might include at least some pitch-sequencing effect as well. Comparing Retrosheet and PITCHf/x numbers Let’s start by showing a scatterplot featuring framing runs saved (prorated to 5,000 pitches caught*) by catchers in the seasons from 2008 to 2012. The darker dots denote a higher number of pitches caught, signifying more reliable estimates. * Keep in mind that from here on, when I write “pitches caught” I really mean “pitches caught with no swing attempt by the batter.” Not a bad start. The chart displays a good agreement between the two different models; the Pearson correlation coefficient, weighted for the number of pitches caught, is a healthy 0.72. One important difference between the two methods is the distribution of ratings. The PITCHf/x-based numbers are more dispersed: when one considers catcher-seasons with at least 1500 pitches caught, the standard deviation is close to 13 runs for the PITCHf/x numbers and about 7.5 for the Retrosheet ones. That means the Retrosheet-based values (I’ll call them “RetroFraming”) will yield more conservative results. Given the good agreement of RetroFraming with the PITCHf/x-based numbers, we can move on to showing some numbers going back to 1988, keeping in mind that we’ll less likely see extreme values with this metric. Single-season achievements Here a note is due. In the previous section, I warned that RetroFraming numbers give more conservative results: in fact, there is no trace of a 50-run season. A recent revision of my algorithm has changed Jose Molina’s PITCHf/x framing value for 2012 to 41 runs, but that would still make it higher than Ausmus’ 2000. RetroFraming has Molina’s 2012 at 25 runs saved, which is quite a difference. I know such discrepancies can be enough for some people to turn away altogether from this article and others on framing, as they often do when two play-by-play-based fielding metrics disagree on an evaluation of any position player. However, what I make of these numbers is this:
Enough talk—here are the 20 best RetroFraming seasons since 1988.
At age 40 Carlton Fisk was still capable of a top-20 season. In a subsequent section, I’ll take a look at aging curves for the framing skill. In case you’re wondering, the worst season belongs to framing whipping-boy Ryan Doumit (2008) by a mile, with Jason Kendall (2000) and Jorge Posada (2005) just a bit better. Career framers
Jose Molina is a solid second, despite much more limited playing time. In fact, over the same amount of playing time, we’d estimate Molina to be close to twice as valuable as Ausmus. Below is the Top 10 list for prorated (to 5,000 pitches caught) values, minimum 25,000 pitches.
At the bottom of the list, depending on whether you prefer the counting stat or the prorated version, are either Charles Johnson (costing more than a win per year for 12 seasons) or, once more, Ryan Doumit. Year-to-year correlation A look at aging Below are charts for a few interesting careers. In each one of them, the dots indicate the seasonal ratings, the thinner line is a smooth curve through the data points based on the displayed catcher’s data only, and the thicker line makes use of data coming from the other catchers as well (sort of regressing the curve). Here’s Jose Molina, who just keeps getting better: Ausmus also improved throughout his career: Posada, on the other hand, displayed a declining trend:
Finally, Piazza’s numbers were consistent throughout his career: What’s next? In this article I’ve used pitch-by-pitch data without PITCHf/x information to generate historical leaderboards. However, this kind of data is also available for Minor League Baseball going back a handful of years, so numbers like those shown above can be calculated for lower levels of baseball as well. In that way, good framing catchers might be identified before they reach The Show. And while it might be a long time before we see ubiquitous pitch-tracking technology in the college game, recording pitch outcomes is much more feasible, meaning that teams might even use this information for drafting purposes. Incidentally, while refining this article, I mentioned its contents to a baseball insider (who obviously will go unnamed here), and he stated, “It's an idea potentially worth millions of dollars.” So, clubs with college pitch-by-pitch data: feel free to knock at my door.
|
Amazing Max, this is fantastic. You talk about 'game calling' which is a combination of calling a pitch, calling for a pitch location, and then framing that pitch. Is all this attention on pitch-framing worth little if a good framing pitcher is a terrible pitch caller / locator. Shouldn't more attention be on the other two pieces of this puzzle before we start lobbying Jose Molina / Brad Ausmus for the Hall of Fame. I recall in a previous article of yours that there are good framers that ultimately lose that value as a result of poor game calling. Or have we dismissed 'calling' to be less of a skill and more of a manager function. Thanks again.
I plan on looking more into the 'game calling' issue.
From explorations on the data I have done, the composite value is much driven by framing, but there are exceptions.
And the exceptions seem to be consistent year to year: for example, I have A.J. Pierzynski nowhere close to the top in framing, but he seems to be one really improving his pitchers.
So, yes, you'd want everything correctly rated.
I prefer having separate numbers because maybe one skill can be trained more than the other, or someone else might take charge of it (you may be OK with a good framer / bad caller by having all the calls coming from the bench, for example).