November 29, 2016
The World Series of Coin Flipping
Let’s have our own World Series, you and I. A World Series of coin flipping. First to four wins. Your call, heads or tails. Ready? Let’s go.
There are four ways you can win. First, you can win four flips in a row. Since the odds of winning any one flip are 50/50, your chance of a sweep are ½ x ½ x ½ x ½, or ½4, which equals .0625. You have a 6.25 percent chance of winning four straight flips.
There are a few ways you can beat me 4-1. I can win the first flip, you the next four. Or you can win one, I win one, then you win three. Or you win two, I win one, and you win another two. Or you take three, I win my one flip, and then you finish me off. The odds of each of those sequences, since there are five coin flips total, is ½5. But there are four ways for you to win, so 4 x ½5 = .125. You have a 12.5 percent chance of winning our World Series 4-1.
I’ll spare you the narrative, but you have a 15.625 percent chance of beating me 4-2. And our series will go to seven games, with you victorious, 15.625 percent of the time as well.
We can double those amounts and figure the chances that our World Series will last a given number of flips. If you have a 6.25 percent chance of winning in four straight, so do I. There is a 12.5 percent chance of a sweep. Similarly, there’s a 25 percent chance that one of wins in five. There is an equal 31.25 percent chance that our series goes to six or seven flips.
Why did I bother telling you all of that? Because it’s relevant as we look back on seven-game postseason series. Let’s look at only the divisional era, beginning in 1969. Since then, there have been 47 best-of-seven World Series (every year, 1969-2016, excluding 1994) and 62 best-of-seven Championship Series (every year beginning in 1985—the LCS prior to then was best-of-five—excluding 1994). That’s 109 best-of-seven postseason series. How have they shaken out?
Before I answer that, let me address something you may be thinking: Arbitrary endpoints. There have been World Series since 1903. (I’m ignoring the 1884-1890 World Series and 1892 Championship Series, which were considered exhibitions.) Why am I limiting the analysis to 1969 and later? Well, baseball in 1903 was a lot different than baseball in 2016, and comparisons from over a century ago are strained.
Yes, the difference between 1969, which I’m including, and 1968, which I’m not, was just one year, but there were significant differences to the strike zone, the mound, and the composition of the two leagues (divisional play, four expansion teams) in 1969. Runs per game shot up 19 percent from 1968 to 1969, the most in the two-league era. It’s as clean a divide as we’ve seen. Further, as I’ll explain, the addition of playoff rounds changes the dynamic.
For the LCS, there were eight four-game sweeps. Sixteen teams won in five, 23 in six, and 15 in seven. In the World Series, there were nine sweeps, 11 five-game series, 11 that went six, and 16 that went the distance. So adding them together, there have been:
Let’s compare the distribution to that of the World Series of Coin Flipping:
See what’s going on there? There are more four-game sweeps and fewer seven-game series in baseball than in random coin flips. Here it is graphically:
Why is there a difference?
You’re probably thinking: Because baseball isn’t coin flips. And you’re right! Ratings for our World Series of Coin Flips would be terrible (though the games would be shorter). More to the point, a coin flip is a 50/50 proposition. Baseball isn’t. There are some series that appear to be evenly matched, but then there are teams like the 1998 Yankees, with a 114-48 regular-season record, who went 11-2 in the postseason. Every team they played was inferior. So might that explain what’s going on here?
Maybe. After all, if the odds aren’t 50/50, we’d expect more four-game series and fewer seven-game series, which is exactly what we’ve got. We can test for that. If the coin isn’t fair, the results will be different. If, for example, we have a coin that comes up heads 75 percent of the time, we’d expect heads to win most of the time, often in short succession. All told, we’d expect to see our series to go to four flips 32.0 percent of the time, five flips 32.8 percent, six flips 22.0 percent, and seven flips 13.2 percent.
It turns out that we can minimize the error between the actual distribution of postseason games and the theoretical number by assuming that the coin comes up heads more than 50 percent of the time. The exact number is about 58.3 percent. That gives us something like this:
See how the yellow bars and the red dots line up better? This underlying assumption—that one side of the coin, or one team, can be expected to win 58.3 percent of the time—creates a better fit to what actually has happened.
But now we have another problem. The 58.3 percent part ... does that make sense? Is there a World Series matchup between two teams in which one of the teams goes into each game with a 58.3 percent likelihood of winning? In 1981, Bill James introduced the Log5 formula, used to calculate the expected winning percentage when two teams of unequal ability play one another.
For example, if a team with a talent level of .600 plays a team with a talent level of .400, log5 predicts that the .600 team has a 69.2 percent chance of winning. If a .600 team plays a .500 team, the odds drop to 60.0 percent. If the opponent is a .550 team, the probability falls further, to 55.1 percent.
So what kind of teams would yield a 58.3 percent winning probability for the better team, which best fits the distribution of series length we’ve seen since 1969? Well, it’s a sliding scale. If a 75-87 team plays a 62-100 team, the better team can expect to win 58.2 percent of the time. The thing is, we don’t get a lot of 75-87 teams in the postseason. So let’s look at a team that went 93-69. A team with that record can expect to win 58.3 percent of the time against a team that wins 79 or 80 games.
Think about that for a second. The pattern of series lengths—four, five, six, or seven games—in the 109 best-of-seven postseason series since 1969 best resembles the outcome of a best-of-seven series between this year’s Red Sox and the Marlins. Or the Dodgers and the White Sox. Or the Nationals and the Royals. Or the Yankees and the Phillies. That doesn’t seem right, does it? Postseason series may be uneven, due to personnel or luck or injuries, but they’re not that lopsided. Are they?
So what’s going on here? Are we seeing good teams play mediocre ones in the postseason without realizing it? I think there are three explanations.
First, while 109 seven-game series aren’t a tiny sample size, they aren’t a huge number, either. Maybe we should be looking at all postseasons. Prior to 1969, there weren’t any Championship Series, but of the 66 best-of-seven World Series (it was best-of-nine in 1903, 1919, 1920, and 1921, and not played in 1904), there were 27 that went seven games, 13 that went six games, 16 that went five games, and 10 that went four games. Add those to the 109 in this study, and the distribution looks a lot more like the 50/50 coin flip distribution:
But, as I pointed out earlier, there are some significant differences between baseball prior to 1969 and the years since. Further, the addition of the LCS, a second round of the postseason in 1969, and the Division Series, a third round in 1995 (and the 1981 strike season), increased the likelihood that a superior team will be eliminated early, creating more unbalanced contests (if not Red Sox vs. Marlins unbalanced).
Second, I’m using win-loss record to define which team is better. That’s fine in general, but may not necessarily apply in the postseason, when (take your choice) a dominant starter or two, a shutdown bullpen, contact hitters, or a home run-based offense may be a superior predictor of success. Postseason matchups since 1969 may not be Dodgers vs. White Sox on the face of them, but in some cases, that may be how they work in October.
Finally, this could just be noise. As noted, 109 series isn’t an enormous sample size. The disparities shown in the first table and graph in this article—four-game series comprising 15.6 percent instead of the theoretical 12.5 percent and seven-game series comprising 28.4 percent instead of the theoretical 31.3 percent of postseason series—are notable, but don’t rise to the level of statistical significance. Maybe we’re just going through a fallow period for long series, just as 1955-1968, when 10 of 14 World Series went seven games, was a bumper crop.
I’d argue for a combination of the latter two factors. I do believe that the three rounds of the postseason (four if you count the Wild Card game) creates the possibility, if not the likelihood, that postseason matchups are more imbalanced than they were in the past. That imbalance likely exacerbates differences between teams in the traits that lead to postseason success.
But we can’t dismiss randomness, either. We shouldn’t be surprised to see more seven-game series (like the last World Series, especially Game 7, please) going forward, and for non-gambler’s fallacy reasons. Postseason matchups may not be as close as they were before divisional play, but they’re probably closer than the past few decades might suggest.