BP Comment Quick Links
![]() | |
March 4, 2013 BP UnfilteredDaddy, What's Replacement Level?In the pages of yesterday’s Boston Globe, veteran sports reporter Bob Ryan declared war on WAR. We get that one a lot. But the unusual part of this particular declaration was that it was based on the belief that the “RP” in WARP—for “replacement player”—was a "judgment call" rather than the product of a mathematical formula. Ryan argued that the "replacement level" comparison, as currently constituted, is just a matter of opinion, and therefore arbitrary and unreliable. It's not often that we’re told that we’re not using enough math. It seems that Mr. Ryan might be misunderstanding what replacement level is and how it’s calculated, mistaking a mathematical abstraction for "something that we make up as we go along." In fact, replacement level is the result of a perfectly logical calculation. So let me take a moment to set the record straight. WARP seeks to answer this basic question: If Smith suddenly vanished from the face of the earth, how much production would his team lose as a result? The general idea is that his team would do the best that it could, either promoting a guy from the bench to the starting job, bringing someone up from the minors, or signing a scrap-heap free agent who plays the same position. It wouldn’t get the same production it would have gotten from Smith, but it would get something. We need a way to compare the value that Smith supplies to the value of these guys on the bench, in the minors, and on the scrap heap. Mr. Ryan correctly points out that WARP converts a player's exploits on the diamond into run values, and includes his hitting, defense, and baserunnng contributions for hitters. We might say that Mike Trout contributed five billion runs (okay, the number might have been slightly smaller) to the Angels last year, all told. But to what shall we compare him? A summer's day? No, we compare him to the value of the "replacement players," who are the bench/minor league/scrap heap guys. Because Trout played center field last year, we need to find all the bench/minors/scrap heap center fielders out there. The 30 guys who led their teams in time spent in CF don't count. But everyone else who primarily played center field (i.e., that was the time where he personally spent the most time) does. We can look to see how much value these guys collectively brought to their teams. Had Trout himself disappeared, the Angels probably would have responded by playing Peter Bourjos and Torii Hunter more often. But we don't want to credit or blame Trout for the presence of other players who just happen to be on his team, so we take an average of what everyone else's bench players might have done in Trout's place, rather than compare him just to the Angels’ backup options. Then we look at how much value those backup center fielders, on average, would have provided in the amount of time that Trout played last year. Replacement level is a mathematical abstraction in that no such "replacement player" actually exists—you can’t point to Larry over there and say that he is the gold standard of replacement level. But really, a replacement player is just the per plate appearance (or per inning) mathematical (weighted) average performance of all backup center fielders, multiplied by the number of plate appearances (or innings) that Trout (or any other player whose value we want to assess) played. In using this composite sketch of the state of backups in MLB, we trade the ability to answer the question, "What really would have happened to the Angels if Trout had vanished into thin air?" for the ability to compare everyone in MLB against a common baseline. Depending on the question that you want to answer, this may or may not be a beneficial assumption. It has advantages and disadvantages, but I'd argue that the advantages have more weight here. If you'd like to take issue with how WAR defines value (and the assumptions inherent in it), then that's fine. If you'd like to take issue with the methodology used to calculate it, perhaps to say that the math and the definition don't fully match, that's fine too. A good scientist—and I consider myself to be a proper scientist—should give a fair hearing to a reasonable argument. But as always, we've started with a reasonable definition of what we're looking for, tried to create the best mathematical model that we can based on that definition, and then let the numbers fall where they will. That’s a better approach than making it up as we go along.
Russell A. Carleton is an author of Baseball Prospectus. Follow @pizzacutter4
|
As a physics teacher, I'm mulling sending this article to some students, to help them see that sometimes mathematical abstractions make you understand the world better.
One thing I've wondered about from time to time is how well a team of all replacement players would fare against the rest of the league. Also, does aggregate team WARP scale linearly against team wins? That is, do excellent teams perform better than their total WARP compared to a replacement level team, perhaps because of lineup synergy or having unusually good bench players?
A theoretical team of replacement level players (let's say the Astros for simplicity's sake) is normally considered to win 40 games in a 162 game season. That's why each additional WAR a team accrues is added onto the baseline of 40 games.
Well, let's do a little math with this year's PECOTA projections. The sum of all PECOTA WARP projections for all players projected to play at the MLB level this year (using the DC_FLAG on the spreadsheet), is 966. Divide it by 30, the average team is projected to win about 32. Since on average all teams win 81 games, a replacement team would be expected to win about 49.
I'm not aware of a way to quickly tabulate this, but it would be a good check of PECOTA to compare the WARP projected in the pre-season for 2012 to the WARP assigned to players in 2012. The number of wins available each year is constant, and a well-normalized system should roughly obey this.
And in the difference between this 49-win figure, and the 40-win figure accurately cited by swarmee as the theoretical basis for replacement level, can be seen the essence of Ryan's argument, which I believe has some merit -- not a lot, but some.
Let's be honest here: there IS a component of judgment -- or if you prefer, "subjectivity" -- as to where replacement level resides. If there wasn't, everybody's values for players' WAR (or WARP) would be the same. They aren't. Most of the time they are fairly similar, but there are occasional extreme outliers. This is in contrast to traditional stats, where at least everyone's understanding of batting average, HRs, RBIs, ERA, etc., is exactly the same (which is not to say that that understanding then translates to a correct understanding of value).
Furthermore, we ARE making it up as we go along. There is constant fine tuning of the calculation of WAR/WARP/whatever. I submit that this is a good thing, not a bad thing; it means that we are continuing to think hard about what it means to be good at baseball. The "making it up" is on a long-term basis, rather than just pulling numbers and formulas out of some body orifice that are different today than yesterday so that we can "demonstrate" the superiority of the guy we perceive today as being the best player. (Ryan apparently doesn't get that.) But it still goes on.
This doesn't mean that I agree with Ryan; at the 90% level, I don't. However, I do agree with him that it's important to avoid overclaiming what WAR can tell us about players, and about baseball.
(Incidentally, I'm also a physicist.)
"If there wasn't, everybody's values for players' WAR (or WARP) would be the same."
There are a LOT of things that differentiate WAR calculations aside from the chosen replacement level:
http://www.baseball-reference.com/about/war_explained_comparison.shtml
Oh, for sure, and I just focused in on replacement level as one of those components. The point still stands, though: WARP and things like it are outputs of models. A scientist knows that a model's outputs are only as good as the model itself, in combination with the input data. Since there are differences of opinion/assumptions from model to model, not to mention that the data set itself gets squirrely once one gets away from balls, strikes, hits, etc., to defensive "zones" and definitions of line drives versus fly balls, there's a lot of judgment/subjectivity to the models that we often ignore. As long as those things are handled consistently and with some detachment, the resulting conclusions have value. They're still not as detached and objective as some of the more fervent advocates of the models would have us believe they are.
I agree with everything you say, but I think we need to exercise caution when using phrases like "making it up". People here get it, but to most baseball fans that sure sounds like "just pulling numbers and formulas out of some body orifice that are different today than yesterday so that we can "demonstrate" the superiority of the guy we perceive today as being the best player." "Refining" might be a better euphemism.