August 17, 2012
The BP Wayback Machine
Setting the Stage for 2005: Steroids
While looking toward the future with our comprehensive slate of current content, we'd also like to recognize our rich past by drawing upon our extensive (and mostly free) online archive of work dating back to 1997. In an effort to highlight the best of what's gone before, we'll be bringing you a weekly blast from BP's past, introducing or re-introducing you to some of the most informative and entertaining authors who have passed through our virtual halls. If you have fond recollections of a BP piece that you'd like to nominate for re-exposure to a wider audience, send us your suggestion.
Can we detect the use of steroids statistically? Nate attempted one approach in the piece reprinted below, which was originally published on March 30, 2005.
It has become increasingly difficult to take a neutral position on the steroid issue.
My past four weeks have featured three bookstore signings, four fantasy drafts, two chats here at BP, a neverending series of radio gigs, and innumerable conversations with friends and readers about everything from
It's not just that I want to avoid pouring more fuel on the fire, though as far as I'm concerned, the Crusading Sports Journalist belongs in about the same pit of hell as the Philandering Televangelist, the Embezzling Senator, and the Reality TV Show Host. Nor is it that, somewhere deep down, it really does disappoint me that at least some of my favorite baseball players have been gaining a competitive advantage ingesting or injecting substances that can't be purchased at your local Walgreens.
Rather, it is the fact that I am a performance analyst, and steroids are an extremely difficult issue to discuss from the standpoint of performance analysis. We don't have more than a handful of confirmed or admitted positive results. We don't have any real idea of the differential effects that steroid use would have on pitchers and hitters. We don't have any sense for the scope and the duration of steroid use. We sure as hell don't have a reliable control group. In short, any sort of analysis based on examining the career paths of individual players is likely to be somewhere between fruitless and utterly misleading.
But if the media frenzy on the issue is to be believed, then greater sanctions against steroid use--whether in the form of suspensions, fines, or public scorn--are going to have an awful lot of impact on the way the 2005 season plays out. A substantial number of star players are going to fall flat on their face. The standings are going to shift in rapid and unexpected ways as the cookies-and-cream teams differentiate themselves from The Clear-and-The-Cream ones. The entire San Francisco Giants lineup is going to develop pituitary tumors. Biceps are going to tear; elbows are going to shatter. The fans are going to stay away, the marketing deals are going to evaporate, and NASCAR is going to become the new national pastime. (Never mind that baseball has always been unpredictable, that star players have always seen their performances crater without any apparent reason, that fans are gulping up tickets with unprecedented gusto, or that baseball just this week signed a huge, new marketing deal with one of the world's largest corporations.)
It will go without saying that I don't expect any of that to happen. I don't expect a substantial change in offensive levels, nor a disproportionate number of individual collapses. I do expect lots and lots of surprises, things that PECOTA could never have seen coming with a processor the size of the Hubble Space Telescope, but not any more than there usually are.
I don't expect any of those things to happen because we do have some evidence on this issue, and the evidence does not support the prevailing opinion.
Let's take a step back for a moment. Suppose that the predominant media opinion on the subject of steroids is correct: a substantial number of players are using steroids, and steroid use results in substantial and bifurcating improvements to player performance. We will call this the Steroid Gap Theory. What would we expect the corresponding impact on the game's competitive ecology to look like?
It might be the case that offensive levels would rise, if more hitters than pitchers were using steroids, or if the benefits of steroid use were more profound for hitters than they were for pitchers. But this would not be the distinguishing mark of steroid use; offensive levels cycle upward and downward all the time, and they have since the very origin of the game. Rather, the distinguishing mark would be that variance in player performance would increase. If some players, be they hitters or pitchers, were gaining a new and substantial competitive advantage, while others were remaining in place, then we'd expect a greater amount of differentiation between the best-performing players and the worst-performing players, in the same way that, say, placing every child west of the Mississippi into a top-tier private school and every child east of the Mississippi into a forced labor camp would increase the differentiation in nationwide SAT scores.
Take the simplest possible case: we have two baseball players of essentially identical ability, and we have an increase in their collective performance. Let's say that these players are named Jose and Ozzie, and that they each hit 20 home runs last season.
We could achieve the increase in one of two ways. Either the increase could be global, affecting each player equally, or it could be differential, affecting one player but not the other:
In both cases, the average number of home runs hit by the players would be the same at 25. However, in the differential case, the standard deviation in home run rates would be larger.
This is not merely an artifact of there being only two players in the league, or the players having started from the same initial performance level. We can conduct the same thought experiment over a larger and more diverse group of players, and come up with the same result.
I took a list of the season-ending home run totals for all major league players who accumulated at least 400 at bats in a season between 1988 and 1992, the last years before the recent upswing in offensive levels that began with the expansion year of 1993. There were around 580 players on this list; the average number of home runs hit was just under 14, and Cecil Fielder had the best individual season at 51 home runs.
I then inflated the home run statistics the list in one of several different ways:
Note that it's the latter two examples that correspond most strongly with the Steroid Gap Theory. The Global Impact case, meanwhile, would correspond with an improvement that all hitters might benefit from, like an increase in the liveness of the ball, while the Weak Differential case might correspond with a more widespread phenomenon like improvements in health and training methods.
Here is what the data look like after we perform those alterations:
Average St. Dev Pre-Inflation Era 13.8 HR 9.7 HR Post-Inflation Scenarios: Global Impact 15.2 HR 10.7 HR Weakly Differential Impact 15.1 HR 10.8 HR Differential Impact 15.1 HR 11.0 HR Strongly Differential Impact 15.1 HR 11.4 HR
In each case, both the average and the standard deviation in the number of home runs increase. However, while the average increases by about the same amount in each scenario, the degree of increase in the standard deviation is greater when the inflation is applied more differentially. Put simply: players gaining a differential advantage should lead to differential results.
While we cannot, at least at this point, reliably decipher the impact that steroids might have on an individual's statistics, we can compare the results of our thought experiment to actual leaguewide statistics. To start with, let's look at seasonal means and standard deviations in home run rates before the "Juiced Era" offensive explosion began. In the chart below, I've plotted the mean and the standard deviation number of home runs hit per 650 plate appearances in each National League season between 1961 and 1992 (I've excluded the strike season of 1981; the short season has the effect of artificially increasing the standard deviation). Both the mean and the standard deviation are weighted based on plate appearances.
This time period incorporates a number of important changes to the game, like the influx of Latin American players, the development of the modern bullpen, various iterations in the size of the strike zone, and a correspondingly diverse array of leaguewide power levels, ranging from the modern deadball era of the mid-60s, to the "Juiced: The Prequel" year of 1987. In spite of all of that, the relationship between average home run rates and the standard deviation in home run rates remains highly linear, and highly predictable: as the average increases, the standard deviation increases proportionately.
We know, of course, that average home run rates increased markedly as of about 1993. What we don't know is how the impact was distributed: was it confined to just a subpopulation of hitters, or were the benefits conferred to more or less everyone? If the former is true, as the Steroid Gap Theory posits, then we'd expect not only the league average to increase, but also the standard deviation to increase disproportionately on top of it.
The evidence renders a clear verdict to the contrary.
The dashed line represents a linear projection outward of the trend that we established using the older data set. The red points represent actual National League power levels in the "Steroid Era"--the years from 1996 through 2004 (I've excluded the transitional year of 1993, as well as the strike-shortened seasons of 1994 and 1995). Standard deviation increases as we go up the chart, so we'd expect the new data points to be above the dashed line if the Steroid Gap Theory is correct, indicating that the juicing players have caused variance to increase disproportionately with the gain in power output itself.
As it happens, not only has the increase in the standard deviation failed to keep a proportionate pace with the increase in home run rates, but it has actually decelerated. That is, while offensive output has increased substantially, the playing field has become comparatively more level. Last season, for example, about 19.3 home runs were hit per 650 plate appearances in the National League, with a standard deviation of 11.9. Compare that to 1970, when just 15.6 home runs were hit per 650 PA--about a 20 percent decrease from contemporary levels--but the standard deviation was actually a bit higher, at 12.3.
This is far from a perfect experiment. But at the very least, it is highly problematic for the Steroid Gap Theory. If just a substantial minority were benefiting from steroid use, and the benefit were predictably and markedly positive, then we'd expect the differentiation between the haves and the have-nots to have increased. That differentiation has in fact increased on an absolute level, but it has decreased relative to what we would expect given the overall environmental improvements that all hitters are benefiting from, be those in the form of expansion, a lively ball, a smaller park, the birth of
It is imperative to note that, while these results are inconsistent with the Steroid Gap Theory, they are not inconsistent with the notion that some players have used steroids, or the sentiment that steroid use is a problematic thing for the game. It is irrefutably the case that some players have used steroids, and as far as I am concerned, it is just as irrefutably the case that it will be to everyone's benefit to increase the rigorousness of testing programs, and the penalties for steroid use. What is required, however, is a different framework for understanding the impact that steroids might have had on the game's statistics--an alternative to the Steroid Gap Theory. I will propose three such hypotheses that are both more consistent with the data, and more consistent with my underlying sense of reality:
It is my belief that the latter theory is closest to the mark. There is clearly something going on--but it is not producing the sort of predictable impacts that everyone expects. Nor, because of the complexity of the underlying chemistry, are we likely to see substantial changes in the game's statistics resulting from efforts to curtail use of these substances.
In other words, apart from an increase in the Scoville Scale of your local baseball column, 2005 is likely to be business as usual. And that is a wonderful thing.