December 21, 2001
The MVP Prediction System
First in a Series
Why did Barry Bonds win the MVP this year, but not last year? Why did he win it in 1990, 1992, and 1993, but not 1991?
You'll hear a lot of theories about why players win MVP awards, a lot of speculation. It's important to be on a winner...but what about Andre Dawson in 1987? It's supposed to help to be from a media center...but Mike Piazza couldn't win one in Los Angeles or New York. In fact it's been more than a decade since a Met or Dodger took home the award.
Some say that getting off to a fast start is the key to getting the voters attention; others, that a noisy stretch drive is what it takes. Some people say that race matters to the voters and some people say it doesn't, and there are examples and counterexamples trotted out to prove or disprove the point. Then there's the theory that the voters like to reward players who haven't won before, or punish those who have won too many times...on it goes. I'm sure somewhere someone has advanced the theory that having blue eyes or being born in February is crucial to the voters.
I can't solve all of this. I can solve half of it. I can report how the National League Most Valuable Player Award is determined: what counts, what doesn't, how much it counts, and how often odd things happen.
National League MVPs have been basically predictable since at least the end of the World War II (1946), using only a small number of the variables about which people speculate. Triple Crown stats, winning a division, and a rough positional adjustment (but only for players on teams that win something) are the important criteria. Voters appear to use only a small number of inputs, and combine them in basically the same way every year. It's not perfect; every once in a while, something really bizarre happens, but those occurrences are rare.
Specifically, my NL MVP predictor assigns one point each for the following:
Total those points for all players; those with the most points are the candidates. If there's more than one candidate, sum their batting average, RBIs, and home runs, and add a bonus for playing an up-the-middle position and/or playing for a division winner. That's it. That's going to give you your MVP.
What this means is that all of the other things that people believe matter in MVP voting probably don't affect the most important decision: the winner. Triple Crown stats, winning, and a rough positional adjustment. That's all there is. In fact, during the two-division period, 1969 through 1993, the system identified the eventual winner as a candidate in all but two cases: Kirk Gibson in 1988, and Willie Stargell's half-award in 1979. In one other case, the tie-breaker picked the wrong candidate.
Sometimes, the first stage of the predictor produces two or more candidates. In those cases, a simple formula predicts the correct winner almost every time. Add the numbers in the three triple crown stats. To that, add 15 for catchers, middle infielders, and center fielders; add 15 for players on division winners. Since divisional play started, the system has identified more than one contender 18 times, and the tiebreaker has successfully identified the eventual winner in 16 of those cases.
For example, in the 1991 race discussed above, the system selected Terry Pendleton and Howard Johnson as candidates. Pendleton (.319/22/86, and add 15 for the Braves' division title = 442) defeats Johnson (.259/38/117 = 414) easily. Note that Bonds (.292/25/116, Pirates win the East = 448) would have narrowly won the award, according to the system, had he been a candidate, but with only two points to Pendleton's three, Bonds loses.
Why does the system work? The system essentially tries to copy the approach of the typical voter. It assumes that voters do take the job seriously, that they do look at stats, and that they do give credit for playing on a winner. However, the system also assumes that the writers only look at triple crown stats, only care about leading in those stats and surpassing the two key milestones (.300 BA and 100 RBI), and make a quick in/out judgement about what counts as a winning team.
Why are those assumptions valid? Those of you who have filled out a ballot in the annual Internet Baseball Awards version know that the biggest problem is often in identifying the candidates. In 2001, it was obvious to many people that Bonds, Sosa, and Luis Gonzalez were the serious candidates for the NL MVP Award, but it's often a lot harder to narrow the field. Indeed, in the American League in 2001, one would need to check and re-check to make sure that no worthy candidate was omitted from a preliminary list.
This system assumes that the writers narrow the field by checking out the winners of the triple crown categories, then move on to considering the key players (again, defined by triple crown stats) on the winning teams. The writers know, even if it's only in a very rough fashion, that the offensive standards for all positions are not the same; they deal with that by giving extra credit to playing the up-the-middle positions, although only to players on winning teams. In the first stage, the system places a premium on easy-to-spot markers that writers probably use to sort out the contenders from the pretenders. In the tie-breakers, the system assumes a head-to-head comparison, so the initial magic numbers (the thresholds in RBI and BA, and recognition for leading the league) no longer matter.
Note that this system says absolutely nothing about who deserves the award. My approach is inspired by the Bill James Hall of Fame Monitor (available at the wonderful baseball-reference.com), and is intended only to determine what regular patterns, if any, can be found in award voting, not whether those patterns actually reflect value. It's easy to see that National League MVP voters overvalue some accomplishments and undervalue others: a third baseman on a team that finishes second in the division who bats second in the lineup behind a terrible leadoff hitter, walks a lot, and has excellent doubles power is never going to take home an MVP, even if he is the best player in the league. A mediocre slugger has a good chance if he has a lot of runners on base in front of him and his team wins the division, even if he makes a lot of outs.
Do I know for a fact that the assumptions built into the system are true? No, but for the National League, they do the job. Writers behave as if they work that way, and it makes sense from what they say about how they go about their work (I make sure to read as many explanations for votes as I can find). Adding other variables, or refining the ones listed, yields far fewer correct predictions.
In part two, I'll go over the mistakes, the incorrect predictions the system generated over the period 1946-1993.
Jonathan Bernstein has been walking around with a goofy grin on his face ever since Barry Bonds agreed to stay put with the Giants for at least one more year. He thanks the gang at rec.sport.baseball for their helpful suggestions about MVPs.