April 24, 2007
Lies, Damned Lies
The Cruelest Month
April is the cruelest month in baseball--unless you happen to be a pitcher. Through the first three weeks of the regular season, the average major league team has scored just 4.48 runs per game, which is the lowest figure through this point in the season since 1992. National Leaguers have seen their home run output drop from 1.10 per game last season to 0.84 this year. The American League is hitting a collective .253, which would represent its lowest total since the league batted just .239 in 1972, the final year before the implementation of the Designated Hitter.
Are we experiencing a shift in baseball's perpetual tug-of-war back to pitchers? Is this an artifact of small sample size? Or is the weather to blame?
When I fielded this question in my chat last week, I said that "my gut instinctůis that the decline in offense is large enough to be material above and beyond the weather." After running the numbers, however, I'm now not so sure.
Let's skip right to the geeky bits. Applying a simple, unpaired t-test reveals that, with greater than 99 percent confidence, we can say that the change in run-scoring output from 2006 to 2007 is not the result of randomness alone. In other words, sample size doesn't explain away the difference by itself.
It might also be thought that offense is routinely a little bit lower in April than it is throughout the rest of the season, since temperatures are routinely a little bit lower. In fact, the opposite is the case. I surveyed the data on run scoring output in the first three weeks of the regular season for each of the fifty seasons between 1957 and 2006, and then compared it against scoring for the rest of the season onward. On average, teams scored 4.41 runs per game in the first three weeks of the season, and 4.36 runs per game the rest of the way out.
The explanation for this is not that weather is unimportant; cold weather does harm offense, as we'll discuss in a moment. Rather, it appears that pitchers begin the year a little less prepared than hitters. This is particularly apparent if we look at the data from 1995 and 2006, when spring training was abbreviated because of the strike and the World Baseball Classic, respectively. Hitters started the year very strong in each of these cases.
What is clear, however, is that changes in run scoring output experienced in the first three weeks of the season tend to be fairly 'sticky'. As detailed in the table below, there were seven instances in which the run-scoring level in the first three weeks of the season represented a change of at least 50 points upward or downward from the previous season. (Note that 2007 is not among these seasons--run scoring is down by 0.38 runs per team game thus far this year, not quite meeting our threshold).
Previous Year First Three Weeks Rest of Year 1959 4.30 4.97 (+.67) 4.34 (+.04) 1963 4.46 3.70 (-.76) 3.97 (-.49) 1969 3.41 4.29 (+.88) 4.05 (+.64) 1971 4.34 3.81 (-.53) 3.92 (-.42) 1977 3.96 4.57 (+.59) 4.46 (+.50) 1994 4.60 5.22 (+.62) 4.87 (+.27) 2006 4.58 5.09 (+.51) 4.84 (+.26)
In each of these seven cases, the direction of the arrow did not change as we moved from the first three weeks to the rest of the regular season. In other words, if run scoring was down in the first three weeks, it continued to be down to some extent for the rest of the season, and vice versa. However, also in all seven cases, the magnitude of the change decreased as the season wore on.
We can evaluate this data a bit more precisely by means of a regression analysis, which reveals that run scoring levels during the first three weeks of the season are in fact quite a strong predictor of offense throughout the rest of the year. That variable alone explains 71 percent of the variance in run scoring. If we add the previous year's run scoring levels to the equation, the R-Squared increases to only 75 percent. In other words, generally speaking, the first three weeks tell us a lot more about where run scoring is going to settle than the entire previous season.
In fact, the previous year's run scoring levels are of questionable statistical significance at all if we also account for a time trend in the analysis (that is, a linear increase in scoring from year to year, as has generally been the case over the past half-century). A regression model with these three parameters--year-to-date run scoring, previous year's run scoring, and a time trend--predicts that run output throughout the rest of 2007 will be 4.64 runs per team game, which would be materially down from last year's figure of 4.86 RPG, but consistent with the run scoring levels in recent seasons like 2005 (4.59 RPG) and 2002 (4.62 RPG). Our na´ve conclusion, then, is that the decline in run scoring is "real," but not of earth-shattering proportions. We need to be careful, though, because this analysis remains pretty na´ve.
One reason why the first three weeks can be predictive is they sometimes reflect the early impact of structural changes in the game. If we look at those seasons in which run scoring in the first three weeks was the most divergent from where it finished the previous season, we'll find that several of the years have a ready explanation attached. In 1963 the mound was raised, and offense declined predictably. In 1969, the mound was lowered, the strike zone was contracted, and the American League was expanded, yielding a very profound increase in run scoring; 1977 was also an expansion year. It shouldn't be a surprise that changes in run scoring are sticky when these sorts of considerations came into play.
There was no expansion this year, however, nor any alteration to the strike zone, nor any especially significant changes to baseball's rules. Something was very unusual in the first three weeks of the season, however, and that was the nation's weather.
Thanks to The Weather Underground, I've been able to compile the average high temperatures throughout the first 22 days of April in the 23 baseball cities with an open-air stadium (New York and Chicago are intentionally double-counted). These temperatures are compared against the long-term average high temperatures in each city over the same period:
City 2007 AVERAGE Baltimore 56.6 63.2 Boston 47.8 54.7 New York 53.4 59.3 Chicago 53.5 56.4 Cleveland 52.9 53.3 Detroit 52.3 55.0 Kansas City 58.9 63.9 Anaheim 70.8 72.1 Arlington 69.3 74.9 Oakland 63.0 65.5 Atlanta 67.7 70.2 Miami 82.4 83.4 New York 53.4 59.4 Philly 55.3 59.3 Washington 58.0 65.0 Chicago 53.5 56.4 Cincinnati 57.8 63.4 Pittsburgh 52.8 59.3 St. Louis 59.0 65.4 Denver 57.4 59.8 L.A. 69.1 72.8 San Diego 63.4 68.5 San Fran. 62.0 64.0 AVERAGE 59.6 63.7
As you can see, the weather explanation is not just a lot of hot air. The decline off normal temperatures has been both deep and broad. Temperatures have been below average at all 23 facilities, and about four degrees below normal overall.
Indeed, while a four-degree difference in temperatures is not trivial, it probably understates the case, because temperatures have not just been below average, but have been wildly inconsistent from week to week. Essentially, the first two or three days of the regular season were played in decent weather, and the past 5-7 days have featured above-average temperatures throughout most of the country. The weather was absolutely brutal, however, for the 10-12 days in between, with temperatures routinely ranging between 10-20 degrees below average throughout large parts of the East Coast and the Midwest. Perhaps one-third of the schedule to date has been played in conditions that are utterly inhospitable to baseball. Since the effects of weather on run scoring are somewhat non-linear--really cold weather hurts offense more than really warm weather helps it--this has made a profound difference.
Moreover, the structure of the decline in run scoring is highly consistent with what we'd expect from inclement weather. First and most obviously, if we break 2007 down into the three weeks of the regular season to date, we find that offense has warmed with the temperatures:
Week 1 (April 1 - April 8) 4.16 RPG Week 2 (April 9 - April 15) 4.19 RPG Week 3 (April 16 - April 22) 4.91 RPG
Run scoring last week--the first week of the season played under "normal" weather conditions--rebounded to 4.91 runs per team game, which is almost exactly consistent with where we left off in 2006 (4.86 RPG).
In addition, as Chris Constancio of the Hardball Times discovered, certain aspects of offense respond differently to cold weather than others. In particular, Chris's numbers suggest that:
Thus, what we'd expect to see if weather is the culprit for the decline in offensive output is abnormally low BABIPs and home run rates, abnormally high walk rates, and perhaps a very marginal increase in strikeout rates. In fact, this is just about exactly what we have seen:
So what can we expect the rest of the way out? Long-term forecasts predict that this summer's temperatures will be normal to slightly above throughout most of North America. I'd tend to look at our na´ve model's prediction of 4.64 RG as a floor, and last week's output of 4.91 RPG as a ceiling, which gets us somewhere in the range of 4.75-4.80 runs per game for the rest of the season, a mere tick down from last year. The death of offense is greatly exaggerated, and if you're playing in a fantasy league, it's a great time to make a move for some cold hitters before the weather heats up.