October 5, 2009
Prospectus Hit and Run
The Perils of Relying on Short-Term Memory
Against long odds, the final week of the 2009 regular season wound up producing down-to-the-wire excitement in both leagues, though for the most part, that excitement had nothing to do with stellar play. The Dodgers used a season-high five-game losing streak to keep the suspense regarding the NL West flag and home-field advantage building for an entire week, with the Phillies and Cardinals failing to capitalize, and the Rockies falling just short of overcoming a lackluster two-week stretch prior to their final sprint. Meanwhile, the AL Central has produced its second consecutive Game 163 play-in, this time due to a mad rush by the Twins and a collapse by the Tigers that may yet prove to be historic.
Against this backdrop, viewers have been treated to writers, broadcasters, and in-studio pundits admonishing such slumping teams to pull themselves together as they pontificated on the importance of heading into the playoffs with momentum. The oft-cited example remains the 2007 Rockies, who won 13 of their final 14 regularly scheduled games, then a play-in, and ultimately the NL pennant. Forget the fact that just one year prior, the Cardinals dumped nine of their final 12 before becoming the team with the lowest victory total ever to win the World Series-these experts certainly did. The question obviously arises as to whether there's truth to such conventional wisdom about whether late-season performance carries over into the playoffs. The answer is a fairly resounding no.
With the help of Eric Seidman, I pulled late-season records for every playoff team of the Wild Card era from 1995 through 2008, 112 teams in all. For each team, we recorded their record over the final seven, 14, and 21 games, as well for September and whatever fragment of October remained. The results of Game 163 play-ins initially weren't included in either the "week" records (which didn't always coincide to weeks, but which were somewhat easier to gather) or the "month" records; including them didn't change the results substantially. Here are the correlations between the interval's winning percentage and first-round success:
Interval Corr162 Corr163 Final 7 .019 .016 Final 14 -.020 -.021 Final 21 -.042 -.043 Final Month -.028 -.028
That, folks, is a whole lot of nothing, an essentially random relationship between recent performance and first-round success. None of the correlations even reached .05 in either direction, and six of the eight were actually negative.
Okay, so those few week intervals don't tell us much about the outcome of those five-game series. They tell us only slightly more about the entire postseason. Here are the correlations between those winning percentages and overall playoff success as measured by the number of series won:
Interval Corr162 Corr163 Final 7 -.043 -.049 Final 14 -.097 -.101 Final 21 -.119 -.121 Final Month -.112 -.115
That's still nothing to write home about, and the slate is now uniformly negative, suggesting that, if anything, there's an ever-so slight inverse relationship between success in the final weeks and in the postseason. Perhaps that's because some of these playoff-bound teams are resting their regulars more often, or simply regressing to the mean after a summer of beating up on opponents. Even if we create a points system, doubling the value of winning the League Championship Series and quadrupling that of the World Series such that the same number of points are awarded per round, the magnitude of the largest correlation-for the final month, 163-game version-still doesn't get any bigger than .137, and it's negative at that. It's still essentially nothing.
By and large, these teams that made the playoffs did well over the various intervals in question, winning at a .595 to .601 clip and serving to remind that there's a selection bias at work here: the teams that did very poorly likely missed the postseason, relegating themselves to the dustbin of history. Indeed, just 13 of these 112 teams put up sub-.500 records from September 1 onward, and only two of them, the 1998 Padres and 2008 Brewers, were more than five games below .500 during that stretch run. Even so, six of those 13 teams won their first-round matchups, all six of them won their respective League Championship Series, and three of them won the World Series (the 1997 Marlins, 2000 Yankees, and 2006 Cardinals). Recall those 2000 Yanks lost 15 of their final 18 games prior to the postseason while being outscored 148-59, exhuming the memory of the 1899 Cleveland Spiders in the process before flipping the switch and trampling everything in their path to a third straight World Championship.
As well as those teams did over these short stretches, it's noteworthy that the recent records of the teams that won in the Division Series and the teams that lost are virtually indistinguishable. Over the seven-game split, the two teams' aggregate records differ by one win across a sample of 784 games, and over the month long split, the difference is a net of four games. The split between the two grows as the pool of teams decreases in the Championship Series and World Series rounds, but not in the direction we'd expect:
Interval DS W DS L CS W CS L WS W WS L Final 7 .594 .592 .571 .621 .551 .592 Final 14 .597 .600 .581 .612 .541 .622 Final 21 .598 .604 .578 .617 .558 .599 Final Month .599 .601 .579 .619 .570 .589
Every single split but the seven-game/Division Series one-the only one from among the first two tables with a positive correlation-shows that the teams that lost the series had a better aggregate record over the recent intervals than the teams that won, again suggesting that there might be some effect of resting the regulars or otherwise regressing down the stretch.
On a team level, recent performance as measured by wins and losses simply isn't predictive. For further evidence of this, consider a quick-and-dirty study I did in the service of the Hit List this summer in response to the suggestion of making recent performance a stronger factor in the rankings, thus conforming to some readers' perceptions that the hottest teams at the moment were thus the strongest teams overall.
Using the 2008 Hit List, I broke the season up into four-week chunks ("months," for the purposes of this study) and tested the correlation between each team's "monthly" actual, first-order, second-order, and third-order winning percentages as well as their Hit List Factor (the average of those four percentages), against the following "month's" actual record. I used these four-week splits because they were easily created from my master Hit List spreadsheet, as I only save the adjusted standings for the days I use to compile the list.
The correlations for "monthly" ____ winning percentage to next "month's" actual winning percentage:
Indicator Corr Actual .21 First-order .24 Second-order .18 Third-order .17 Hit List Factor .22
Not much to hang onto there. I then tested the correlation between the various year-to-date winning percentages from those increments and the next month's actual winning percentage.
Indicator Corr Actual .304 First-order .289 Second-order .298 Third-order .296 Hit List Factor .312
Though it's hardly a robust effect, this admittedly slapdash study does support the none-too-controversial idea that a larger sample size such as a year-to-date performance is more useful than a recent incremental performance in predicting wins and losses going forward. Even so, as I found last year, when it comes to the playoffs, actual records over the full season aren't as helpful as Pythagorean ones, which are based on the underlying performances that tend to even out across even larger sample sizes.
As the postseason unfolds over the next few weeks, you're going to hear a lot about momentum and its importance to a ballclub, and while it's undoubtedly a good idea to bear Earl Weaver's famous maxim in mind, the take-home message is that the conventional wisdom that a team's recent performances foreshadows their playoff fate is generally wrong. The fact that there are no shortage of pundits who elevate the 2007 Rockies as their evidence while forgetting the 2006 Cardinals underscores either how little attention some talking heads pay to actual results, or how short their attention spans are.
In any given series, there may well be reasons to predict one team having the upper hand in a given series due to the strengths and weaknesses of the various matchups; things like the Phillies' closer situation and the Dodgers' rotation jumble will have a very real impact who gets to play, and what might arise from that. Even so, the differences between any two teams who make it to the October crapshoot are small enough that the range of outcomes in a short series is almost unlimited, and the effect of recent performance shouldn't be overemphasized.