BP Comment Quick Links
April 14, 2011 The BP Wayback MachineSizing Up Small Sample SizeWhile looking toward the future with our comprehensive slate of current content, we'd also like to recognize our rich past by drawing upon our extensive online archive of work dating back to 1997. In an effort to highlight the best of what's gone before, we'll be bringing you a weekly blast from BP's past, introducing or re-introducing you to some of the most informative and entertaining authors who have passed through our virtual halls. If you have fond recollections of a BP piece that you'd like to nominate for re-exposure to a wider audience, send us your suggestion. Brush up on James Click's look into an April advantage for pitchers on the sixth anniversary of its original appearance as a "Crooked Numbers" column on April 14, 2005. Every year is a fresh start. For teams and for players the changes of a winter's worth of work are finally on display. Despite all the changes from last year, most of baseball remains the same from year to year, but there is an adjustment period in the early part of the season as teams and players settle into the season. Small sample size doesn't mean no sample. While there's meaning in how a team starts off, it's also important to determine whether the early parts of the season can be deceptive for reasons other than the lack of sufficient data, especially when considering individual player performances. There's already evidence that hitters tend to perform better in the first half of the season than in the second half. There's the conventional wisdom that pitchers dominate in the colder months early in the season while August is when the bats wake up. Then there are the A's fans who keep looking at Barry Zito's 4.51 ERA the last three Aprils followed by five months of 2.74, 3.80, 3.46, 3.13, and 3.34. April results that don't fit the public perception are usually attributed to some change discovered by the media looking for the cause. A hot start by a hitter is attributed to a change in batting stance, weight, or physique. This year's example is Eric Hinske whose new stance is the easy answer to his hot start. With pitchers, learning or mastering a new pitch or changing the delivery are the easy answers for early success. Teams off to hot starts have new veteran leadership or youthful exuberance. Inherent in a lot of this discussion is the idea that other players have yet to adjust to the changes in their opponents. In the matchup of batter and pitcher, we usually assume that the pitcher benefits more from deception and lack of information than the hitter. Pitchers never before seen by hitters can hide the ball in different ways, mix in unexpected pitches, and throw off a hitter's timing with a new windup. Warren Spahn put it best: "Hitting is timing; pitching is upsetting timing." One quick way to determine if pitchers see an early season advantage is to look at league wide stats broken down by month:
There are several different trends depending on the year. 2000 saw ERA decline steadily throughout the season until September; 2001 and 2003 peaked in June, 2002 in July, and 2004 in August. More importantly, there doesn't appear to be any distinct trend towards lower ERAs in April; if anything, there's a slight dip in May and a rise in June in four of the five seasons, perhaps as teams begin to weed through who's playing well and who's not before the hitters catch up in the hotter months. Month-by-month ERA may not be the best indicator of any inherent advantage by newer pitchers or pitchers who have changed their repertoire since last season. Instead, let's break things down by a pitcher's starts against a particular team. To do so, I'll look at each pitcher's performance broken down by the number of times he's seen that team, including the current appearance. Let's see what we get when looking at 2004 numbers: App Year IP ERA K_PA BB_PA HR_PA H_PA 1 2004 21305.0 4.50 .166 .087 .029 .237 2 2004 10795.0 4.56 .166 .085 .030 .240 3 2004 5209.0 4.28 .173 .085 .029 .228 4 2004 2859.0 4.42 .177 .083 .030 .237 5 2004 1482.7 4.38 .179 .084 .029 .236 6 2004 769.7 4.20 .183 .086 .028 .233 7 2004 410.7 3.75 .187 .091 .024 .224 8 2004 251.3 4.08 .202 .105 .033 .225 9 2004 164.3 3.40 .228 .077 .029 .228 10 2004 90.3 3.49 .211 .110 .018 .175 11 2004 43.7 4.12 .199 .044 .017 .249 12 2004 11.3 2.38 .163 .102 .041 .184 13 2004 2.0 0.00 .429 .000 .000 .143 Or if you're a more visual person:
(For the curious, those 2.0 IP in the thirteenth appearance against a team were contributed by Tom Gordon against the Orioles, Scott Eyre against the Diamondbacks, and Joe Nathan against the Tigers. The highest since 1990 was Mike Myers against the Diamondbacks in 2001 with 15 appearances. Maybe that's why the Snakes acquired him that winter.) In the over 30,000 innings when pitchers faced a team either once or twice in 2004, they had ERAs of 4.50 and 4.56. After that, as appearances increase, ERA declines steadily. Of the four major metrics to accompany ERA, K/PA increases while ERA decreases--as we would expect--but BB/PA increases as well. (Nate Silver has already discussed the advantages of using K/PA rather than K/9, so I'll endeavor to use K/PA in the future. For reference, Randy Johnson and Johan Santana led all qualifiers last year with a K/PA of .301 while Kirk Rueter finished last with .067.) Lest we think that 2004 was bucking a trend or the small sample sizes as appearances increase, here are the numbers for 2003 and 2002:
The most obvious explanation here is that players who are called upon to face teams many times are going to be the best pitchers in the league. To check for that bias, here's what we would have expected each group to do based on their weighted season performances in 2004. Here's what we get: App Year IP ERA K/PA BB/PA HR/PA H/PA 1 2004 21305.0 4.73 .164 .087 .030 .239 2 2004 10795.0 4.50 .169 .085 .029 .236 3 2004 5209.0 4.31 .173 .084 .028 .234 4 2004 2859.0 4.20 .175 .084 .028 .232 5 2004 1482.7 4.09 .179 .085 .027 .229 6 2004 769.7 4.03 .183 .088 .026 .227 7 2004 410.7 3.73 .192 .089 .024 .221 8 2004 251.3 3.63 .200 .089 .023 .219 9 2004 164.3 3.42 .199 .085 .021 .218 10 2004 90.3 3.31 .205 .084 .021 .217 11 2004 43.7 3.42 .207 .085 .022 .216 12 2004 11.3 2.90 .198 .080 .018 .202 13 2004 2.0 2.37 .276 .076 .017 .168 Compare that to the other chart above and we get the following: App Year IP ERA K/PA BB/PA HR/PA H/PA 1 2004 21305.0 0.23 -.002 .000 .001 .002 2 2004 10795.0 -0.06 .003 .000 -.001 -.004 3 2004 5209.0 0.03 .000 -.001 -.001 .006 4 2004 2859.0 -0.22 -.002 .001 -.002 -.005 5 2004 1482.7 -0.29 .000 .001 -.002 -.007 6 2004 769.7 -0.17 .000 .002 -.002 -.006 7 2004 410.7 -0.02 .005 -.002 .000 -.003 8 2004 251.3 -0.45 -.002 -.016 -.010 -.006 9 2004 164.3 0.02 -.029 .008 -.008 -.010 10 2004 90.3 -0.18 -.006 -.026 .003 .042 11 2004 43.7 -0.70 .008 .041 .005 -.033 12 2004 11.3 0.52 .035 -.022 -.023 .018 13 2004 2.0 2.37 -.153 .076 .017 .025 As opposed to the apparent improvement in performance as appearances increase, pitchers actually perform worse as their appearances mount. Pitchers performed about a quarter of a run better in their initial appearance against batters than we would expect from their complete season performance, but performed steadily worse as appearances mounted. The discrepancy between the expected and actual ERA in the initial performance against a team is especially conclusive given the massive sample size of innings involved in the initial appearance. Teams may be pretty good about selecting the correct pitchers for the majority of the playing time, but diminishing returns increase as those pitchers face the same teams more and more during a season. Though there isn't any apparent improvement in pitching performance in April compared to other months of the season as evidenced by league ERA, pitchers do appear to see a slight advantage in their initial appearance against an opposing team. Things tend to even out in the second or third appearance, but after that, the batters appear to have figured things out and the advantage is now gone. Adding a new pitch or a new wrinkle to a pitcher's motion may work for a while, but don't expect that advantage to last all season. This trend doesn't bode well for struggling players like Zito, so if you're an A's fan, perhaps you should just forget you read any of this and read up on regression to the mean. 2 comments have been left for this article.
|
This study would be better if RP had been eliminated from the discussion. RP don't face the same players from appearance to appearance, so I think their data probably just introduces noise.
Also, I think it shouldn't be looked at per season. Seeing a pitcher's advantage would be better if you looked at it per career, in my opinion.
Considering the article's intro specifies that he was looking for inherent advantages at the start of the season, as those old yarns about learning a new pitch, best-shape-of-my-life can take effect before they are picked up by the opposition, there wasn't any point in looking for career-wide patterns.
As for your concern about relief pitchers, any changes would be picked up by a team's coaches and hitters are given better warning.
On that note, it would be interesting if Click had also eliminated pitchers to see if there was any difference between the two pitcher types.