BP Comment Quick Links
May 12, 2005 Crooked NumbersAre 'Roids the Reason
Pluralitas non est ponenda sine necessitate.
Occam's Razor translates to "Plurality should not be posited without necessity," a phrase that's usually taken to say that the simplest answer is the best one. That's not technically correct; Occam (or Ockham) was addressing the issue that additional information added to a working theory that provides no additional accuracy should be cut out. Hence, Occam's Razor and not Occam's Boring Theory About Theories. Mainstream media treatment of the steroids issue falls into the former interpretation and while we haven't heard anyone hide behind the razor, the idea that it's simply easier to say that steroids are to blame is easier for everyone to swallow. The simplest explanation is the best. Besides, who has time to read an entire book discussing everything you ever wanted to know about steroids and baseball? The next logical stage in the steroid coverage is the introduction of statistical evidence to confirm that baseball's new, tough testing policy is having an effect. Both Joe Sheehan and Nate Silver covered this to some extent yesterday, but there are still some possible reasons for the offensive decline that haven't been covered. Several managers and players have suggested other possible reasons for the decline, among them the weather and the absence of Barry Bonds. There are certainly other possible explanations--the natural ebb and flow of offensive levels in baseball or the retirement of power hitters--that might have something to do with the outage. Nearly all of these deserves or could us a full article-length discussion (much like Nate's discussion of the graying of the game), but we'll try to cram them all in here with a follow-up next week to discuss some areas in more depth. Improper use of statistics Before we get into any discussion of why home runs are down this year, we should make sure that they are in fact down. Simply saying fewer home runs were hit in April in 2005 than 2004 doesn't take into account the number of games or even plate appearances in the denominator. Fortunately, the AP article is looking at home runs per game. So the chances that their stats are misleading are unlikely. Rather than home runs per game, let's go one step further and use home runs per AB, PA, and balls in play (in this case of BIP, we'll include home runs, so this will be a percentage like the others, not a ratio). This way we can correct for varying numbers of ABs or PAs per game in previous seasons due to higher offensive levels or more extra innings games. Keep in mind that in higher run scoring environments, players get more chances to hit home runs because there are more ABs and PAs per game. In effect, runs beget more home runs and since we know that home runs beget runs we get into a nice little spiral effect. With that in mind, here are year-by-year averages of home runs in April per AB, PA, and balls in play since 1996: Raw Numbers Percent Change Year HR/AB HR/PA HR/BIP HR/AB HR/PA HR/BIP ---- ----- ----- ----- ------ ------ ------ 1996 .0333 .0292 .0412 N/A N/A N/A 1997 .0276 .0242 .0338 -.1712 -.1712 -.1796 1998 .0285 .0251 .0352 .0326 .0372 .0414 1999 .0324 .0284 .0399 .1368 .1315 .1335 2000 .0373 .0326 .0460 .1512 .1479 .1529 2001 .0343 .0303 .0428 -.0804 -.0706 -.0696 2002 .0281 .0248 .0347 -.1808 -.1815 -.1893 2003 .0304 .0268 .0375 .0819 .0806 .0807 2004 .0314 .0277 .0384 .0329 .0336 .0240 2005 .0275 .0247 .0336 -.1242 -.1083 -.1250All three metrics are very consistent with each other. If HR/BIP had plunged more than HR/PA, we might have been able to say that the decline in home runs is a result of fewer balls in play going out of the park rather than simply an overall decline. Some might see that hypothetical situation as a point in favor of the steroids argument, but instead batters are hitting roughly the same number of balls out of the park when they actually get wood on the ball. Let's also quickly look at some broader offensive numbers for Aprils past: Year HR/PA SO/PA UBB/PA XBH/PA AVG OBP SLG ISO ---- ----- ----- ----- ----- ---- ---- ---- ---- 1996 .0292 .1678 .0918 .0499 .270 .346 .433 .163 1997 .0242 .1612 .0914 .0500 .265 .342 .410 .145 1998 .0251 .1672 .0892 .0526 .266 .340 .418 .152 1999 .0284 .1647 .0924 .0529 .267 .344 .430 .163 2000 .0326 .1640 .0930 .0538 .270 .348 .450 .180 2001 .0303 .1767 .0810 .0499 .260 .331 .425 .165 2002 .0248 .1670 .0820 .0536 .258 .331 .410 .152 2003 .0268 .1679 .0839 .0534 .261 .335 .418 .157 2004 .0277 .1614 .0832 .0532 .268 .340 .427 .159 2005 .0247 .1632 .0795 .0512 .259 .328 .405 .146Strikeouts are up (though they're still lower than most of the past decade), walks are lower than they've been in ten years, XBH (in this case excluding HRs, so just 2Bs and 3Bs) are way down, batting average is down, OBP is down, SLG is down, and ISO is down. This year hasn't just been an absence of home runs, all offensive metrics are in the pitcher's favor. Having confirmed that offensive is objectively lower, let's get into some possible reasons. Steroids This is the easy one: we don't know. Anecdotal evidence suggests that the power drain is due to the newer steroid testing, a policy publicly perceived to be as effective as cutting off Sampson's hair. Never mind that the biggest names to be suspended for steroid use are Alex Sanchez (2 home runs in 2004) and Juan Rincon (uh, none), nor the fact that the biggest sluggers stand accused of using steroids so well-designed that they are undetectable to testing. Few people like to acknowledge that there's little to no evidence about the effects of steroids or steroid testing on the game outcomes. Simply throwing out steroids as an untestable hypothesis will no doubt leave certain people frustrated, but acknowledging what we don't know and cannot know is just as important as performing proper analysis on things than we can. Weather The idea that this April has been particularly cold has been cited several times as an alternative reason for the decline in offensive numbers. There are two points that must be confirmed before we can say that this is a valid reason for the offensive decline. First, that cold weather does suppress offensive numbers and by how much; and second, that this April has been colder or wetter than previous seasons. Breaking up temperature into five strata of roughly equal numbers of games over the past 10 years, offensive numbers in those conditions look like this: Temp R/G HR/G ---- --- ---- Cold (Under 60) 9.5 1.92 Cool (61-70) 9.5 2.10 Mild (71-75) 9.7 2.19 Warm (76-85) 9.8 2.25 Hot (86+) 10.4 2.44 So there is a steady increase in the number of home runs and runs per game as temperature increases. Now let's see how April 2005 stacks up against previous seasons: Year Avg Temp ---- ---- 1996 61.8 1997 60.2 1998 63.7 1999 64.4 2000 64.5 2001 64.7 2002 64.3 2003 62.8 2004 65.5 2005 64.2 While it's been chillier than last year, 64.2 degrees isn't drastically out-of-line with any year back until 1997. Last April was the hottest of the past 10 years, so some of the decline this year may be coming from a slight decrease in temperature from an unusual high, but the slight decrease certainly doesn't explain a ten- to twelve-percent decrease in home runs. Weather may be slightly responsible, but not much. Injured Sluggers Bonds is on the DL. Moises Alou was on the DL. Jim Thome is now on the DL. There's the distinct possibility that there are more sluggers on the DL this year than last year. Bonds alone hit 10 home runs last April. Alou and Thome hit seven. Assuming a replacement level of zero home runs and that Thome and Alou's missed playing time adds up a full month, that's 17 fewer home runs already. Adding those 17 home runs to the total this year removes 20% of the drop off. There are a couple problems with that. The replacement level for home runs is not zero. Also, despite the fact that he was injured in April, Thome didn't go on the DL until May having already played 23 games for the Phillies. Instead, let's look at both the group of players playing this April as a whole. While there is always turnover from season to season, if April 2005 saw a higher turnover of players--due to injury, demotion, or retirement--the lack of home runs may be due to new, younger players taking more of the playing time. Here are the numbers for all hitters for the past 10 Aprils who accumulated at least 50 AB in consecutive Aprils. This includes how many players saw their HR/AB go up (HR Up) and how many saw their 2B+3B/AB (labeled XBH, but it doesn't include HR) go up from the previous season. Year Players HR Up XBH Up HR Up% XBH Up% Avg Age 1997 166 61 82 36.7% 49.4% 30.0 1998 173 88 94 50.9% 54.3% 29.8 1999 171 93 81 54.4% 47.4% 29.8 2000 169 88 84 52.1% 49.7% 29.7 2001 171 74 65 43.3% 38.0% 30.0 2002 175 75 88 42.9% 50.3% 29.9 2003 187 104 89 55.6% 47.6% 30.0 2004 178 88 89 49.4% 50.0% 30.2 2005 173 70 73 40.5% 42.2% 30.4 Of 173 returning regulars, 59.5% saw their HR/AB go down this year. That's the biggest drop since 1997. Now let's compare the new guys to the veterans: Year New New HR/AB New Age Old Old HR/AB Old Age ---- --- --------- ------- --- --------- ------- 1997 65 .0256 26.8 166 .0320 30.0 1998 79 .0270 27.4 173 .0332 29.8 1999 65 .0292 27.2 171 .0370 29.8 2000 67 .0426 29.1 169 .0413 29.7 2001 69 .0292 27.6 171 .0405 30.0 2002 75 .0228 28.0 175 .0342 29.9 2003 71 .0258 28.0 187 .0361 30.0 2004 58 .0288 28.2 178 .0359 30.2 2005 67 .0271 28.2 173 .0306 30.4 "New" includes all players who didn't accumulate 50 ABs in the previous April and how they did. You'll notice that their age is about 2 years younger than the "Old" group -- the same group from the first chart. While the New group consistently (except 2000) hits fewer HR/AB than the Old group, there are more New guys this year than in 2004 and they're doing a worse job hitting home runs than the 2004 group. So some of the blame lies there as well. Essentially, the guys who were hitting home runs last year aren't hitting them this year and the 2005 replacements are both more numerous and less powerful than their 2004 counterparts. Like the weather, the absence of Bonds is somewhat responsible for the decline in offense, but the size of the decline is greater than the possible change derived from his absence and the absence of other sluggers. Natural statistical variance Now we come to it. Stating that home runs are down 8.8% or 12% or just down period is, in itself, a misrepresentation of statistics. Before drawing conclusions about that decline, we need to know how significant that decline is. This reason is the one that doesn't seem to get as much airplay or at least not strictly speaking. It's been said before, but if baseball was exactly the same every year, it would be pretty boring. It's not; it's exciting, and the natural ebb and flow of offensive numbers is part of that. We can see that this drop is not unprecedented by looking at the varying HR/PA rates in April over the past 34 years. Joe pointed out yesterday that there have been two bigger drops in home run rates in April the past ten seasons and that's verified by the information above. Using HR/PA as the metric, the April decline of .0030 HR/PA last year ranks as the sixth biggest drop since 1973 (the 2002 plunge--in the middle of the "steroid era"--of .0055 was the biggest). In absolute value, it actually ranks as one of the smaller differences from the previous season, meaning that the transition from 2004-2005 was one of the smoother of the past 33 years. In percentage value, the ranks are very similar. Rather than an unprecedented collapse of power, the drop in home runs from 2004 to 2005 is actually rather routine. Conclusions Briefly recapping, we've seen offense is down across the board, the weather this April was a little colder than last year, the vast majority of returning regulars are hitting slightly fewer home runs, there are more new players hitting fewer home runs than new players in previous seasons, and the change in April home run rate is well within the established variance in baseball over the past 3+ decades. It's the last point that's the most telling: this April is just like many other Aprils of years past. In his book How We Know What Isn't So, Thomas Gilovich discusses the inherent tendency to believe what we want: "When the initial evidence supports our preferences, we are generally satisfied and terminate our search; when the initial evidence is hostile, however, we often dig deeper." It's easy to get caught up in the reports of a steroid testing program and overstate the impact of both steroids themselves and the testing program on the game. It's easier still to go digging into the stats with that predisposition and, upon finding a decline in power from the previous season, scream Eureka and run down the streets naked, overjoyed with a mental breakthrough. Instead, we must step back and recognize that what we're seeing is simply the normal variance in home run rates from season to season. While the graying of the game, the weather, and the absence of some marquee sluggers are the particular factors that have contributed some small part to the offensive failures this year, the real issue is that we're all looking to explain something that happens very frequently and we're using new factors to explain old results. We cannot say if the new steroid testing program is causing the decline in offensive numbers this year, but we can say that it's far from the massive decline as which it's being portrayed. It's less Occam's Razor and more Francis Bacon: "Man prefers to believe what he prefers to be true." 0 comments have been left for this article.
|