BP Comment Quick Links
![]() | |
January 11, 2000 Catching Up With The General: A PostscriptA second look at catcher defenseSince the original appearance of " Field General or Backstop?" in last year's Baseball Prospectus, we're received a great deal of praise and compliments on the article, for which we're grateful. We've also received some thoughtful criticism that is worth responding to (there's also been some not so thoughtful criticism, but that's what the Delete key is for).The primary question raised by attentive readers is that when comparing catchers against their counterparts in consecutive years, the counterparts can change. That is, while Scott Hatteberg caught over 100 games for Boston in both 1997 and 1998, the backup catchers changed. Bill Haselman got the most of the rest of playing time behind the plate in 1997, while Jim Leyritz, Jason Varitek, and Mandy Romero split time backing up Hatteberg in 1998. The Z-score method introduced in " Field General Or Backstop?" rates catcher performance in terms of standard deviations beyond the collective performance of the other catchers on the team. If the collective performance baseline changes from year to year, it could produce variation in the catcher's Z-score even if his ability hasn't changed. That is, if Hatteberg is actually an average defensive catcher in both 1997 and 1998 (in an absolute sense), but Haselman in fantastic defensively, Hatteberg will be negatively rated in 1997 (since he underperformed Haselman). If Varitek and company are poor defensively, the same performance by Hatteberg would be rated highly in 1998 (since he outperformed the others). Given this genuine concern, the question then becomes whether this effect is substantial enough to skew the results and alter the conclusions of the original study. As with many things in research, the design of the study involved tradeoffs. In this case, the alternatives were to select the entire universe of pitcher and catchers, maximizing sample size, or select just those pitchers and catchers who qualified for selection (minimum of 100 PA/season) in two consecutive years, ensuring a consistent baseline for comparison in both years. Restricting the data set to pairs of catchers who worked with the same pitcher over the course of two years would eliminate 85+% of the potential data set that I considered. Reducing the sample size increases the variability in the measurement. As I wanted to test as robust and diverse set of catchers as possible, I opted for the larger data set. The effect of this choice is that the baseline for comparison varies somewhat year to year, depending on the turnover rate (and mix of playing time, even if there is no turnover) in backup catchers. However, if the average variation across the data set is small, it will not overly influence the results obtained. A comparable analysis would be that you compare the batting average of, say, Tony Gwynn to a randomly selected Padres teammate each year. If he has a higher average he "wins", if lower he "loses". Chances are that Gwynn will outhit that teammate, regardless of who is selected, and despite variations not only in Tony's performance, but also the makeup of the team roster from year to year as players arrive and depart. Similarly, in the same comparison, Mario Mendoza would be expected to substantially under-hit a typical teammate. You see more or less consistent results (win-win or loss-loss) in consecutive from players at the extremes. Players towards the middle would be most subject to "flipping" from year to year as the vagaries of who they compared to and the expected small differences between them takes over. Part of the analysis in the article was to look at, collectively, all of those players who were above average in one year (the "winners") and see if they had any tendency towards winning (or, in the terms of the article, posting a positive battery Z-score) in the second year. Even allowing that there is some variance in the baseline of comparison (the composition of the rest of the catching staff), there was essentially zero evidence for any kind of trend for either good catchers to stay good (the Tony Gwynn case) or for bad players to remain awful (the Mendoza case). As mentioned in the article, I even looked at the extremes (those with a battery Z-score greater than +1 or less than -1 -- essentially those more than one standard deviation from the mean), and found no evidence that good catchers would start to stand out from the crowd. Ditto for Z > +2 or Z < -2. No matter how you sliced it, nothing pointed to a consistent game-calling ability for catchers, regardless of how exceptional they have appeared in the past. Space and time constraints prevented a full treatment of this issue in BP99. However, the beauty of the web is that we are not as bound by the constraints of print. I went back to the data I collected, and looked only at pairs of catchers who worked with a pitcher over two consecutive years for at least 100 PA per catcher per year. Using a similar method to the one in the article, I computed the PR/PA (Pitching Runs per Plate Appearance) for the pitcher with each catcher, and took the difference between the two. Then I compared it to the difference in PR/PA in the following year. This may be better explained through an example: Suppose that pitcher Able pitches to catchers Brown and Church in both 1990 and 1991. In 1990, Able-Brown has a 0.100 PR/PA, while Able-Church has a 0.150 PR/PA, for a difference of +0.050. In 1991, Able-Brown has a 0.080 PR/PA, while Able-Church has a 0.65 PR/PA for a difference of -0.015 So this creates a (much smaller) set of data points to compare, where two catchers each work with the same pitcher over two years. We would end up with a list of pitchers & catchers, each with two data points, as in the following list: 1990 1991 Able-{Brown/Church} +0.050 -0.015 Aaron-{Brown/Church} -0.145 +0.004 Evans-{Felix/Gomez} -0.200 -0.075 [...]By taking the correlation between the two columns, we can determine whether a catcher with relative success with a pitcher in one year is more likely to have continued relative success the next. The correlation was virtually zero: +0.01, and virtually identical to the correlations presented in the original article. This, therefore, reinforces the original conclusion that there is no statistical evidence for a substantial and persistent game-calling ability that differentiates among major league catchers. The two new charts recreate two of the key comparisons from the original article. The first chart plots the differences between catchers for a given pitcher in year 1 against the same catchers with the same pitcher in year 2. If game-calling is a skill, good catchers in year 1 should tend to continue to do well in year 2, resulting in a linear trend in the plot. Recalling the original article, we saw exactly this linear pattern when we looked at the ability for batters to hit home runs, or for pitchers to get strikeouts. However, here (as in the original analysis), there is no pattern. Instead we see a random scattering, indicating that there is no relationship between how well a catcher works with a pitcher in one year to the next.
The second chart follows up another analysis done in the original article. We split all the catcher combination in year 1, and categorized them as above or below average, then looked at the range of performances the following year for each subset. Again, recalling the results of the original article, when you plot the cumulative probabilities for each subset for commonly accepted abilities like HR rate and strikeout rate, the two curves have a wide separation on the chart, indicating that there was a significant difference in the distribution of performances between the two subsets in the following year. As you can see in the new chart, even when two catchers are compared only to each other, there is no significant trend for the better catcher in one year to continue showing superior results. The two curves are virtually on top of one another, with no separation between the two distributions of performance.
Thus, even addressing a potential source of error in the original study, the conclusion that there is no detectable game-calling ability still stands. Whether you look across all pitcher-catcher combinations (as in the original article), or whether you focus on two catchers who both work with the same pitcher in consecutive years, there is no tendency for a catcher to exert influence over the opposition's offensive production. While the position of catcher is still physically demanding, and may indeed be a critically important defensive position, we'll need to look elsewhere to assess the full magnitude of his contribution.
Keith Woolner is an author of Baseball Prospectus.
|