July 22, 2016
DRA and Groundball Bias
A few weeks ago, BP author Rob Mains inquired about what he saw as a possible bias in Deserved Run Average (DRA) values in favor of fly-ball pitchers, and against groundball pitchers. Specifically, he observed that ground-ball pitchers were doing worse in DRA, on average, than they were in Runs Allowed per 9 innings (RA9).
When the BP Stats team investigated this further, we discovered that Rob was right. Tommy John is the ultimate edge case for this comparison: an extreme groundball pitcher with several thousand career innings. According to DRA, and PWARP by extension — John offered basically replacement-level value over most of his baseball career. That . . . did not seem quite correct.
Using the weighted Spearman correlation we now prefer, we can quantify the relationship of groundball rate to various metrics. Using all pitchers since 1980, when possible, here are the correlations among GB%, FIP, ERA, and DRA:
There are two rather interesting things here. The first is what Rob noticed: DRA, at least as compared to RA9, definitely had (until today) a moderate positive bias against groundball pitchers. The second is that ERA itself has a small bias in favor of groundball pitchers, at least as compared to RA9. Others have commented on this before, but it assumedly arises because groundball pitchers give fielders more opportunities to create errors, which on balance makes their ERA (perhaps unfairly) better than their RA9, all other things being equal.
DRA’s trends should generally reflect RA9’s trends in the aggregate, even as individual pitchers diverge. Fortunately, the fix for this bias was not a drastic one. First, we identified the primary baseball events to which being a groundball pitcher is actually relevant. Those would be home runs, singles, and groundball double plays to the middle infield. We then added the pitcher’s seasonal ground ball rate as a predictor in each of those models. Finally, we adjusted the baseline “with or without you” (WOWY) basis for each model to ensure that each pitcher was still being credited / debited with their seasonal groundball rate as compared to the league-average ground ball rate that year. Seasonal ground ball rates were given their own simple mixed model, using pitcher and batter as the predictors, to produce a shrunken value that would be more robust, which was in turn used in the other models.
The updated numbers, which are being rolled out today across all seasons, past and present, demonstrate that DRA’s groundball bias has been addressed. Here is how DRA now compares to various other metrics, by differential from runs allowed:
For comparison, we added the biases of SIERA and kwERA as published by our friends at Fangraphs as well. All of these biases reflect the choices these metrics has made in deciding what is and is not of interest when evaluating pitchers. The things these biases tell us about groundball pitching could (and probably eventually will) fill a whole another article.
Both FIP and DRA (as revised) now have essentially no bias with respect to groundball pitchers. Tommy John thus has a respectable 52 career PWARP via revised DRA, which seems right for member in good standing of the “Hall of Very Good.” On balance, only the extreme members of the flyball and groundball communities should see much difference, although most pitchers will probably see at least a slight change.
Although this issue arose internally, we’ve gotten plenty of other healthy suggestions externally, both from BP readers and the larger sabermetric community. Please keep them coming.
Ahmad Emad & Paul Bailey (2016). wCorr: Weighted Correlations. R package version 1.8.0. https://CRAN.R-project.org/package=wCorr.
R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
 This struck us as a nice, robust place for comparison.
 FanGraphs data is taken from 2002 to the present as that is the earliest available date of the groundball data upon which SIERA relies in part. Data for the other metrics goes back to 1980. Their performance from 2002 to the present is not materially different.