The Stats Go Marching In

Who's Ahead of Whom?

by Max Marchi

Printer-friendly

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Subscribe for $4.95 per month
Recurring subscription - cancel anytime.

Purchase a $39.95 gift subscription
a 33% savings over the monthly price!

Already a subscriber? Click here and use the blue login bar to log in.

1.The hitters are ahead of the pitchers. You use this one after your staff gets pounded for fourteen runs early in the spring. After all, maybe the hitters are ahead of the pitchers at this point. Who’s to say which group develops faster?

2.The pitchers are ahead of the hitters. The opposite of number 1, so it should be used when you get shut out by three rookie pitchers nobody’s ever heard of.

—Earl Weaver, The Clichés of Spring (from Weaver on Strategy)

The late Earl Weaver put a box in the first chapter of his classic book dedicated to answers he used to give every spring to “all those sportswriters with nothing much to write about” who every year “asked the same questions”. He alternated between the above depending on whether a loss occurred because his Orioles could not score any runs or allowed too many of them.

“Who’s to say which group develops faster?”

Well, let me try.

Month or temperature?
I hinted at hitters coming out of the gate faster in my article on factors affecting run scoring. It was nearly an afterthought, as I had noticed that run scoring has followed a predictable pattern throughout the years, with the hottest months generating more runs, except for the cold month of April, which produced scoring very similar to August.

In that article I did not try to model time of the year and temperature simultaneously, but I’ll try here. As a first step, I ran a very simple regression model, featuring the month and the temperature as the only predictors and the runs scored as the variable to be predicted. Temperature was taken from Retrosheet files, and I did not go back further than 1991, as before that year a very high number of missing values would have made the analysis troublesome.

The results I obtained indicate a rise of 0.04 runs scored per game (both teams combined) for every one-degree increase in temperature, and a seasonal trend as depicted by the chart below.

It seems that, after you remove the effect of the temperature, July and August are actually run-suppressing months. With the goal of this article in mind, it would seem that pitchers start slow and that it takes a couple of months before they catch up to the hitters.

A (nearly) experimental setting
Temperature was treated as a linear effect in the previous model. In other words, I assumed that every one-degree increase has the same effect on run scoring, whether it brought the temperature from 32 to 33 degrees or from 110 to 111. That assumption might not be true in the real world.

However, the whole purpose of multivariate models is to measure the net effect of one variable after considering the other variables in the model. In an experimental setting one could keep the temperature constant for the whole year and look at the run-scoring trend. But that’s only feasible in physics labs, not in ballparks.

Wait.

We actually have a few nearly experimental settings, where the temperature has been held more or less constant through the season (and the years). That’s in the domed stadiums without adjustable roofs, which are now represented exclusively by Tropicana Field.

Here is the seasonal run-scoring trend for teams playing either at home or on the road at domed stadiums, excluding the few games played at Tokyo Dome. For this analysis, I examined scoring as a function of games played rather than month. The set of charts below didn’t undergo any statistical trick, except counting.

Okay, I lied. There is actually some smoothing, and there are error bars.

The throughout-the-year drop in run scoring is not ubiquitous (see the Astros,) but run prevention has started out slow for the Twins, the Expos, the Rays and all their opponents—and you can make a case for the Astros’ and the Mariners’ opponents as well.

Back to complex
The previous section featured simpler calculations, but that came at the cost of visualizing just a few teams and for a limited time (when they called a closed stadium home).

So I went back to the statistical tricks, namely multilevel regression. I looked at runs scored by the home team, accounting for the park it played in (so that the run-boosting effect of the rarified air in Denver is removed, for example) and again examined temperature and games into the season.

This time I used 10-degree groups for the temperature, in order to detect possible departures from a linear relation with run scoring. However, looking at the chart below, it seems that the linear approximation would not be far off the target.

And here’s the other chart, showing the effect of the number of games into the season once the effects of temperature and the home park are removed.

Starting from game number 60 or so, the line is not well behaved like the one in the previous chart, and I admittedly did not spend much time thinking about what those up and downs might mean (if anything other than noise).

However, a downward trend is apparent for the first couple of months.

Interpretation and caveats
Okay, so it seems that if we take temperature out of the equation, run scoring is at its highest at the beginning of the season and takes roughly until mid-June before reaching a level which is maintained more or less throughout the rest of the season.

Is this enough to say that pitchers are behind hitters early in the year?

Well, pitchers account for just part (albeit the biggest part) of run prevention. Could it be that sloppy fielding is actually the culprit? I did a similar analysis on Defensive Efficiency (DER), and it yielded something like an inverse path (i.e. higher efficiency at the start of the season). However the difference between the best and worst defensive moments is lower than one point of DER, so I would feel more at ease stating that there is no seasonality in fielding.

Let’s assume for a moment that pitchers being behind hitters is fully responsible for the trends outlined in this article. How should teams react to this information, if at all? Should they change their spring training routines in order to ensure that their pitchers are really ready by Opening Day? Or could it be the case that pitchers can last only so long in a season, and if they were at 100 percent in April they would falter down the stretch?

This matter can be further explored in several directions. Looking at individual pitchers’ trends can give us some information, especially if some of them show different patterns year in and year out: we might spot throwers who are at their peak early on in the season and see whether they last until the end of the season. In addition, a look at past decades could be interesting: Retrosheet has sparse temperature data, if any, the further one goes back in time, but other online sources could fill in the blanks.