October 30, 2012
A Brief, Incomplete History of Replacement Level
Most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.
Brandon blogs at Walk Like a Sabermetrician.
This article does not discuss the various definitions of replacement level, the arguments made for and against its use for various purposes, or any such topic with a practical application. While the title might suggest that it will discuss replacement level through baseball history, it sadly does not do that either. Rather, it attempts to briefly describe the history of replacement level as a sabermetric concept up to the mid-1990s or so, when it came to the forefront of most analytical systems.
This history is incomplete because it will surely leave out some notable uses due to the limitations of my library and internet records. It is primarily focused on Bill James' use of replacement level, since this is fairly easy to track through the annual Abstracts. Whether the concept of replacement level originated with James is something that I cannot answer, but it can be traced back to James at the very latest.
I felt it necessary to write this overview because one will occasionally see references to Keith Woolner as the originator of replacement level. This is simply not the case, which is not in any way intended to be a slight to Woolner. His VORP certainly helped popularize replacement level and did much to increase the emphasis on the replacement baseline in the sabermetric community. It is possible to recognize that while also recognizing the contributions of those who used replacement level pre-VORP.
In his first nationally-published edition (1982), James used a system based on Offensive and Defensive Winning Percentage to rank players. If a player was considered to have played full-time (90 games in the field and 10 games at bat—based on outs made—for the strike-shortened 1981 season), then his ranking was just the Winning Percentage based on the sum of his offensive and defensive wins and losses. However, for part-time players, the missing games were filled in by a .333 W%—a de facto replacement level.
In 1983, James made it more explicit. He first found the player's total win-loss record, then calculated the chance that a .400 player would compile that record. James explained:
For example, James figured Sixto Lezcano's two-year record to be 17-8, then calculated the probability that a .400 player would win at least 17 out of 25 games using binomial distribution.
In 1984, James used the Toronto team comment to discuss the distribution of talent in baseball. While the article does not mention replacement level explicitly, the message is clear: average players have value, and professional baseball talent is distributed like the far right tail of the normal distribution.
Later, in the player ratings section, James devotes several pages to a discussion of the pros and cons of comprehensive ranking systems (one that has much in common with the debates about competing win-value metrics today). In the course of this discussion, James offers his lengthiest explanation yet of replacement level, which he has now shifted to .350:
In the 1985 Abstract, James did not use any sort of replacement level in figuring his player rankings, instead making manual adjustments for playing time as he saw fit. However, the contemporary Historical Baseball Abstract included this discussion of replacement level vis-a-vis Pete Palmer's use of an average baseline in Total Player Rating:
In 1987, James used replacement level in the manner in which we most see it today: runs and wins above replacement. In an essay on the 1986 MVP race between Roger Clemens and Don Mattingly (and a race he views as parallel, Jim Rice v. Ron Guidry 1978), James writes:
His third step in the pitcher evaluation spells it out clearly: "How many runs was Clemens better than the replacement-level pitcher?" The fourth step translates this into wins—from RAR to WAR, although James did not use the term “wins above replacement” or the acronym “WAR”.
After James, the replacement level banner was picked up by the Big Bad Baseball Annual. While I have only scattered editions of the book (1992 and 1998-2001), they apparently used WAR as the centerpiece of their player rankings from the 1989 edition, and they were certainly using it in 1992. This also appears to be the origin of the "WAR" acronym. Their system used James' Offensive and Defensive Winning Percentages as its start and applied a .350 replacement W% to both components. In effect, then, their replacement level was lower than what is usually used today, and more similar in baseline to the old version of Baseball Prospectus WARP as developed by Clay Davenport.
In 1999, BBBA changed their methodology from James' metrics to the Extrapolated Wins system, but the replacement level remained at .350 for both offense and defense.
In the early days of the internet, the primary hub of sabermetric activity was the rec.sport.baseball newsgroup. While the archives of the group are difficult to search, one can find several discussions about replacement level dating back as early as 1992 (again, this is not to say that there are no earlier discussions, just that I did not find them). The couple of discussions that I found tended more towards the theoretical than the practical—that is, most of the debate focused on the theoretical justification for a replacement level baseline rather than debate about the particular numeric value at which it should be placed.
That is where the contribution of Woolner becomes very notable—offering a specific numerical value to rival James' estimates of a .350 W% or one run below average. Woolner's VORP apparently first appeared on rec.sport.baseball.analysis on October 10, 1995. Woolner explains that he had previously posted about his methodology on a Red Sox mailing list, and that he has changed the name to VORP as it is "catchier" (ironic, considering that VORP would go on to be one of the acronyms most often lampooned by those not disposed toward sabermetrics). VORP was based on Marginal Lineup Value, with the replacement level set at .035 points of BA, OBA, and SLG below the position average.
Woolner published VORP reports online, and later when he joined Baseball Prospectus it was incorporated into BP's toolkit, further increasing its visibility. As such, VORP became the most commonly cited and readily available replacement-baselined metric. This has caused a little bit of confusion about the origin of replacement level as a sabermetric concept, which can be traced to 13 years before VORP appeared in the work of Bill James—at least.