January 27, 2011
Ahead in the Count
When Eric Seidman and I introduced SIERA last winter, we ran a number of tests to determine if our theoretical foundation of run prevention led to a superior estimation of pitchers’ skill levels. While SIERA had a solid advantage at predicting future ERA over some ERA estimators and a last decimal-point small lead over xFIP, we ran the tests again after 2010 to ensure that it held a lead going forward. Although the regression formula did not incorporate future ERAs and should not have been biased, it's still important to test the following year to see how well SIERA held up.
The short story is this: SIERA had a good year in 2010, as it extended its lead over other estimators. Below is the root mean square error (RMSE) of the difference between various estimators and park-adjusted ERA (using the Lahman Database’s three-year pitcher park factors) for all pitchers with two consecutive years of at least 40 innings pitched. (To avoid the mix-up of last year, I obtained xFIP, FIP, and tERA directly from FanGraphs’ website.)
SIERA Tests for 2003-10
This was the typical order of ERA estimation for the previous six years as well:
SIERA has been ahead in six of seven years, with xFIP in second place in each of those six years, and the two switching places in 2009 ERA estimation (using 2008 estimators). FIP typically finished third, tERA typically finished fourth, and previous year’s park-adjusted ERA finished fifth. The exceptions were 2005-06 (when tERA out-estimated FIP), and 2003-04 (when tERA finished behind park-adjusted ERA itself).
Putting together the complete 2003-10 dataset of seven pairs of years, we get the following RMSEs:
SIERA finished modestly ahead of xFIP overall, by .021 points, and both were ahead of the other three estimators. FIP finished a solid third, though to be fair to it, it isn't a park-adjusted statistic and was never intended to estimate park-adjusted ERA. So, I checked to see how FIP did at predicting next year’s unadjusted ERA; it only did slightly better with a 1.216 RMSE. Of course, that included pitchers who changed teams and obviously had different park effects, but using only pitchers who pitched on the same team in two consecutive years, the RMSE actually fell to 1.160, the same RMSE that xFIP had for park-adjusted ERA estimation.
Pitchers who did not switch teams threw more innings, and were therefore easier to predict overall, so I ran the same test of other estimators on park-adjusted ERA for only pitchers who did not switch teams, and found that the order remained the same:
Table 4. Pitchers with IP>=40 both years with same team, Un-weighted
One criticism of this test is that it treats pitchers who had 41 innings the same as pitchers who threw 241 innings, so I re-ran the test weighted by next year’s innings pitched. The order remained solidly the same, but unsurprisingly, estimation was better overall for pitchers with more innings:
While the consensus has emerged that RMSE is the best test to run, some still prefer to see correlations so I checked that as well, with SIERA similarly ahead:
Variance in ERA Estimators
Something that I have always found interesting about these ERA estimators is that, since they are not projections, they do not regress to the mean at all. The larger the variance of a statistic, the less it regresses to the mean by definition, which means that its ability to estimate the next year’s ERA is going to be worse, even though it may be picking up on temporary skill levels. For the record, the standard deviations are as follows:
Table 7. Standard Deviation of Estimators among Pitchers with IP>=40
Unsurprisingly, the statistics that include un-regressed home runs per fly-ball rates (FIP and tERA) have higher standard deviations, but xFIP has a lower standard deviation than SIERA, meaning that it regresses more to the mean than SIERA. Thus, it is a good sign that SIERA is picking up on enough skill level as it still does better at predicting next year’s ERA despite less natural regression to the mean built in.
How Often is SIERA Closest?
Other than RMSE and correlations, another way to see how well a statistic does is to simply see how often each statistic is closer to next year’s park-adjusted ERA. I did this, and pitted ten different pairs of ERA estimators against each other to see how often each side won. The following are all very statistically significant (with p<.002), as there are over 1800 pitchers who pitched at least 40 innings in two consecutive years between 2003 and 2010:
SIERA finished ahead of xFIP, FIP, tERA, and park-adjusted ERA, each by a noticeable and very statistically significant if not visually damning amount. Note that xFIP beat the other three by significant margins as well, FIP beat tERA more often, and tERA beat park-adjusted ERA itself most often.
Comparing Estimators to Years Other Than The Following
When SIERA came out last year, Brian Cartwright came up with another way of testing estimators: he looked at the RMSE between ERA estimators and park-adjusted ERAs in different years other than just next year. After all, these are not meant to be projection systems. They are meant to estimate skill level, and next year’s park-adjusted ERA has been considered a pretty good estimate of this year’s skill level. However, so is last year’s park-adjusted ERA, so is two years from now, as is two years ago. So I checked all of these, same-year ERA (which obviously gives home run inclusive estimators FIP and tERA an obvious leg up), three years ahead, and three years behind:
Table 9. RMSEs of Estimators in Year T with Park-Adjusted ERA in Years T-3 through T+3, Un-weighted
Unsurprisingly, FIP and tERA do best at same-year ERA, since they give credit or blame to pitchers for their home run rates. I was pleased to see that SIERA was closest in every other comparison. The order stayed the same for the other estimators as well.
These are all un-weighted observations, meaning that pitchers who throw more innings are treated the same as pitchers who throw fewer. The following table shows the RMSE with each pitcher weighted by their innings pitched in the subsequent year:
Once again, the order of estimators remained the same in each year in question:
Table 11. Correlation of Estimators with Park-Adjusted ERA in Years T-3 through T+3
Correlations seemed to move all over the place, but SIERA did have the highest correlation for every year other than same year.
I also tested "win rate," the percentage of how often each statistic was closest to park-adjusted ERA, looking at different years.
The first table is same-year ERA, in which FIP unsurprisingly wins the most. SIERA is closer slightly more often than xFIP. This is a statistically significant win percentage, albeit a small one (p=.02). All other comparisons are statistically significant except SIERA’s deficit in predicting same-year ERA compared to tERA is not (p=.45), despite the fact that tERA credits a pitcher with the full run cost of the actual number of home runs they surrender.
The following tables look at two years from now, two years ago, three years from now, and three years ago:
The results and order of best statistics are similar in each of these tables. Almost all of these are statistically significant, thanks to the large sample size of pitchers, with exceptions including tERA’s victory over park-adjusted ERA is not significant in predicting ERA from three years ago, predicting ERA three years ahead, nor predicting ERA from one year ago. Additionally, SIERA's victory over xFIP at predicting ERA three years prior is not significant, nor is xFIP’s deficit compared to FIP three years prior.
Averaging Multiple Years of ERA Estimators
In discussing this article with Sky Kalkman, he suggested that I look at whether multiple years of this estimator averaged out would show FIP to be better at picking up the elusive skill level in home runs per fly ball. Considering this very plausible, I checked this a few different ways.
First, I looked at just averaging the previous three years of ERA estimators to predicting the fourth year’s park-adjusted ERA, without doing any weighting. I looked at both RMSE and the correlation. Second, I weighted the estimator by innings pitched in each of those first three years. Again, I looked at the RMSE and the correlation. Third, I used that estimate but checked the RMSE while weighting the pitchers by innings pitched in the fourth year. The results?
Table 18. Three-Year Average of Estimator versus Park-Adjusted ERA in Fourth Year
Mixing and Matching
I also ran some other tests (not included) in which I used weighted averages of xFIP and FIP to see if they did better as a mixture than separately. They did, and the best mixture seemed to be 80%/20% for xFIP/FIP in one-year estimation and 40%/60% for three-year estimation. However, these did not outdo SIERA in any of the cases, except the three-year estimation did tie it for observations weighted by IP in the fourth year and estimator weighted by IP in the first three years. However, the rest of the tests had SIERA safely in front.
Squeezing the data every which way, it remains true that 2010 continues to show SIERA to be the best ERA estimator. It is clear that xFIP is almost as good, though if left with one, I would prefer SIERA (perhaps obviously). Interestingly, running a regression of park-adjusted ERA on the previous year’s SIERA and xFIP shows that not only does SIERA to a better job, you should actually lower the expected ERA of a pitcher with a higher xFIP and the same SIERA. The formula given is:
Both coefficients are statistically significant (p=.000, p=.013 for SIERA and xFIP respectively). This means that xFIP is not giving extra information beyond what SIERA does. This peculiar result of a negative coefficient is probably a result of sampling bias, but it is still worth reporting.
Why SIERA Succeeds
The natural question that everyone asked last year when we came out with SIERA was not just if it was the best estimator, but why it is the best estimator. Why is it that a statistic that has fewer years to work with--and therefore does not precisely estimate the run-effect and out-effect of strikeouts, walks, and home runs--does better than statistics like xFIP and FIP, that do precisely estimate those things?
My further research over the last year has helped me understand why. The following are the highlights of this research. The first one listed is the one that we knew already when we published SIERA last year, but it is not the primary reason at all.
My educated guess is that reasons 2) and 4) are the primary reasons for SIERA's superior estimation skill. The stark difference between QERA’s RMSE and SIERA’s RMSE in last year’s testing was primarily due to the negative coefficient on ground-ball rate squared in SIERA. When we ran our initial tests on SIERA, the inclusion of a variable for the square of ground-ball rate often did the most to improve estimation. Further, even though pitchers do have some control over BABIP, the amount that they do control is very similar to the amount that SIERA credits them with through BABIP’s correlation with strikeouts and ground-ball rates.
With this, we're not done with SIERA. We knew when we introduced it that we only had so many years of batted-ball data, and more years will undoubtedly help us better estimate the many coefficients in SIERA (though we have not yet looked at how much 2010 can help). Also, as Colin Wyers has done some work on park-adjusted batted-ball rates for pitchers, this may or may not help SIERA improve its estimation skill. If it does, we will be able to make these changes as well. Furthermore, as run environments change, SIERA will need to adjust accordingly too.
In the end, ERA estimation is clearly a complicated task with a lot of moving parts. SIERA is currently the best way to take a snapshot of a pitcher’s skill level, but with a lot of competition out there, we will continue to work on delivering even better ways of understanding pitcher’s skill level.