<< Previous Article

Pebble Hunting: The Be... (04/27)

<< Previous Column

The Stats Go Marching ... (04/13)

Next Column >>
The Stats Go Marching ... (05/11)

Next Article >>

Overthinking It: The A... (04/27)

April 27, 2012

The Stats Go Marching In

Scoring Runs, Revisited

by Max Marchi

Printer-friendly

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Subscribe for $4.95 per month
Recurring subscription - cancel anytime.

Purchase a $39.95 gift subscription
a 33% savings over the monthly price!

Already a subscriber? Click here and use the blue login bar to log in.

The forces that influence run-scoring
As a reader of this site, you would be suspicious of any article that compared a starter’s ERA and a reliever’s ERA without making any adjustment for role: it has been shown several times (including by yours truly) that the luxury of pitching in short bursts and not having to face the same batters multiple times in a single outing significantly deflates relievers’ ERAs.

Similarly, we can’t model run-scoring on a team level without accounting for all the factors at play at any particular time. Many elements combine to shape the distribution of runs scored. Some of them are quite obvious, while others remain hidden until they’re exposed by the most brilliant analysts. In the following paragraphs, I’ll try to evaluate as many of those components as possible in an attempt to isolate their individual effects on offensive outcomes.

Numbers reported throughout this article were obtained by analyzing data from the 2010 and 2011 seasons.

The Pitcher
All the action in baseball begins with the pitcher releasing the ball, so it makes sense that the pitcher is one of the main drivers in determining the number of runs scored in a game. You can look at the ERA leaderboard to get an idea of which pitchers prevent runs from being scored and which inflate the number of runners crossing the plate.

Run expectancy of a single at-bat
The pitcher’s ERA, just like the number of runs scored during a game, is influenced by several factors, the pitcher’s ability being just one of them.

In a run-scoring environment of 4 to 4.5 runs per team per game, you can expect close to 80 plate appearances per game (both teams combined), which would translate to roughly 0.11 runs expected per plate appearance.

That expectation is bound to go up or down based on whether Roy Halladay or Kyle Davies is on the mound, whether the platoon scenario favors the pitcher or the batter, and whether good or bad defenders are on the field. Let’s start with these three factors.

Pitcher, platoon, and defense
Looking at data from 2010 and 2011, I assigned a run value to each PA using linear weights (you know, those things that say home runs are worth around 1.4 runs, singles between 0.4 and 0.5 runs, and so on). To continue with the pitchers mentioned in the previous paragraph, Roy Halladay’s presence on the mound reduces the expectancy of scoring runs by roughly 0.04 per at-bat, while Kyle Davies increases it by around 0.02.

Adding platoon effects into the mix, I find that run expectancy drops by 0.02 runs per plate appearance when the matchup favors the pitcher.

Then there’s the defense. I re-assigned a run value to each PA using a slightly different method. As I have done in other articles, I ignored the outcome (single, double, out, etc.) by using batted-ball-type information, thus treating all liners equally and doing the same for flies, grounders, and pop-ups.

These days, the Rays are known for their off-the-charts defense. Let’s look at how their prowess with the gloves affects their pitching staff. In the table below (listing pitchers who started for Tampa both in 2010 and 2011), the run value per plate appearance has been multiplied by 40 to yield an approximate value of the effect per nine innings, thus putting it on a scale similar to the familiar ERA.

Pitcher	Defense effect (Runs per 9 innings)
James Shields	-0.23
David Price	-0.32
Jeff Niemann	-0.41
Jeremy Hellickson	-0.57
Wade Davis	-0.18

Note that the above numbers might actually include some effect of the ballparks in which the Rays play—more on this later.

The batter
We have factored in the run-scoring impact of what happens before the ball reaches the plate (as a result of the pitcher) and the run-scoring impact of what happens after the ball is put into play (as a result of the defense), but we have yet to address the impact of another player who has a significant influence on run expectancy: the batter. You wouldn’t expect the same number of runs scored with a team of nine Jose Bautista clones at the plate that you would with nine pitchers who don’t know what to do with a bat in their hands.

In a single plate appearance, the Blue Jays outfielder can be credited with increasing a team’s scoring potential by 0.08 runs. An unfortunate hypothetical pitcher summoned to the mound every time Bautista is at the plate would see his ERA inflated by roughly 3.20 thanks to the unfavorable matchup.

Meanwhile, Roy Halladay’s presence in the batter’s box depresses scoring (-0.06 runs per PA) more than his presence on the mound.

With or without you
Removing the effect of the batter and the pitcher from the outcome of plate appearances is crucial for calculating the influence exerted by other factors. Using a WOWY framework, I was able to discount the batter’s and pitcher’s effect on the scoring of runs. The handedness of the two central characters of any at-bat was also taken into account and, since run value was calculated based on batted-ball types, defense is more or less out of the equation.

So what else makes scoring runs easier or harder? Here’s what I found after throwing several other factors into my model.

The park
Let’s begin in charted territory. The notion that ballparks influence offensive output has been well known since before the birth of sabermetrics.

You’re probably well accustomed to park factors. They usually appear as figures centered around 100, where values over the century mark indicate favorable offensive environments and, conversely, values below 100 are favorable to pitchers and defense. Park factors can be calculated for everything that happens on the field (base stealing, fielding, pitch speed, etc.), but they are most frequently used for home runs and run-scoring.

When you read a value of 110 for the run-scoring factor of a ballpark, you know that runs are scored in that park at a clip 10 percent higher than elsewhere around the league. For the purposes of this article, I calculated separate runs park factors by batter handedness.

Due to its altitude, Coors Field always tops the list of hitter’s havens, and this analysis is no exception: the Rockies’ ballpark inflates scoring expectancy by 0.025 runs per PA when a lefty is at the plate and by 0.015 runs with a right-handed batter at the plate (that would more or less translate to 1.0 and 0.60 increases in ERA, respectively).

Similarly, PETCO Park is well known for its offense-depressing environment. I have it at -0.018 runs per PA for lefties (-0.73 per 9 innings) and -0.009 runs per PA for righties (-0.37 per nine).

As you might expect, Yankee Stadium serves as a good example of why splitting numbers by batter handedness is necessary. Its short porch in right makes it a nightmare for pitchers facing left-handed batters—at 0.02 runs per PA, it’s another Coors. On the other hand (no pun intended), it’s close to neutral when a right-handed hitter is at the plate (0.005 runs per PA).

I would like to emphasize one feature of the park factors calculated for this article: they are depurated from the abilities of the pitchers and the batters who have played in them. Without that adjustment, Yankee Stadium would emerge as more offense friendly for left-handed hitters than Coors Field, because of the talent of the batters playing in the Bronx (both in pinstripes and as visitors). The fact that Fenway Park (in the same division as Yankee Stadium) shows a similar behavior for lefties, turning from unremarkably hitter friendly when unadjusted to slightly pitcher friendly when the talent pool is taken into account, suggests the model is doing a good job.

However, the park most benefiting from the talent pool adjustment is Great American Ballpark in Cincinnati, again for batters swinging from the left side. The presence of lefty sluggers like Joey Votto and Jay Bruce for the hometown Reds, combined with frequent visits from other National League Central power southpaws (see Fielder, Prince and Berkman, Lance, among others), artificially increased the park factor by over 0.01 runs per at-bat (or 0.45 per nine innings). (Bronson Arroyo starting 32 games there over the last couple of seasons also has something to do with the difference.)

The park factor is actually a combination of several elements that could be addressed separately. In fact, several components of weather (prevalent wind speed and direction, temperature, humidity, rainfall, pressure) interact with the shape and field conditions of a ballpark to determine how it plays. For the time being, we’ll be content with a single number that tells us the composite effect of the stadium and its surroundings on offensive output.

Home sweet home
The home team wins more often than the visiting team in baseball. That has been the case every year throughout the long history of the game. There are several explanation that can be adduced for this, including the advantage of batting last, the home team’s familiarity with the ballpark, its support from the crowd, and a better lifestyle for players when they are home. (An interesting view on all the above can be found in the seminal book The Diamond Appraised by Craig Wright and Tom House.)

When you adjust for the talents of the pitcher, the batter and the defense, and the park, you can expect about 0.006 more runs scored per at-bat when the home team is at the plate, which translates to roughly a quarter of a run per nine innings.

The end of the game
The home advantage quantified above happens the final inning notwithstanding. When the home team has to play its offensive half of the final inning (the ninth or any frame after that), it doesn’t necessarily have to use all three outs at its disposal, because the game ends whenever the home team takes the lead. Thus, potential multi-runs innings are cut short in the bottom of the ninth and, as a consequence, fewer runs are scored on average in the bottom frame of innings numbered nine or greater.

Again, after accounting for all the other factors mentioned earlier, you should expect a 0.004 drop in runs per at-bat in the bottom of potentially game-ending innings (or -0.16, in ERA currency).

Time of the year
We know from Prof. Robert K. Adair’s The Physics of Baseball (and from physics in general) that the baseball is expected to travel longer distances at higher temperatures. Thus, it’s reasonable to expect higher scoring totals on warm days than on cold ones.

So given how the seasons progress in the Northern hemisphere, runs should abound in July and August and be scarcer as one gets farther away from the middle of the summer. That’s more or less what comes out in my analysis, except that April appears to be more offense friendly than any other month except for August (see table below).

Month	Runs per nine innings (relative to April)
May	-0.07
June	-0.08
July	-0.04
August	+0.01
September/October	-0.06

Maybe the April result is an aberration, or maybe we can explain it by saying that hitters get into top shape more quickly than pitchers. A more sensible way of dealing with heat would be to look directly at temperature, possibly measured frequently during the game. I did not go into such detail, but I tried to use the temperature recorded by Gameday. The result I got is an increase of 0.12 runs per nine innings associated with an increase of 10 degrees.

Hour of the day
PITCHf/x records the exact time a pitch is delivered, so we are able to say at what time of the day an at-bat occurred. This could be used as a proxy, albeit an imperfect one, for temperature during an at-bat. (Theoretically, with sites such as Weather Underground updating weather information every few minutes, we could adjust for the climate at the at-bat level, like Hit Tracker Online does.) The hour and minute carry some additional information, such as the position of the sun. Furthermore, human beings have different levels of reflex sharpness at different times.

The evening (5 p.m. to 8 p.m.) seems to be the best time of the day for producing runs. The hours around midday (11 a.m. to 2 p.m.) and the afternoon (2 p.m. to 5 p.m.) are slightly less offense friendly, to the tune of 0.01 to 0.07 runs per nine innings. But as the game goes deep into the night, scoring steadily declines. From 8 p.m. to 11 p.m. you can expect a reduction of 0.27 runs per nine innings, which becomes a whopping 0.48 in the wee hours (from 11 p.m. on).

Familiarity and fatigue
What happens when the pitcher gets tired and the batter has seen his arsenal multiple times? We would expect that both fatigue on the pitcher’s arm and the batter’s increased familiarity with his offerings would lead to higher run scoring.

I looked at the two issues separately. When the hitter is facing the pitcher for the second time in a game, offensive production appears to go up by 0.008 runs per PA (+0.32 per nine innings); the third time around, it increases by 0.014 runs per PA (0.56 per nine) compared to the first duel of the game.

The table below shows the change in run-scoring as pitch counts increase.

Pitch count	Runs per nine innings (relative to 0-20 pitch count)
21-40	-0.04
41-60	+0.32
61-80	+0.32
81-100	+0.36
101+	+0.52

The trend looks reasonable—the pitcher gets comfortable after a couple dozen pitches, then tires as the game progresses—but the numbers seem a bit high. Could a starter realistically be expected to post an ERA 0.3 runs worse after just 40 pitches than when he’s fresh?

Here comes a big caveat. Ideally, one would find a way to measure the impact of both fatigue and familiarity simultaneously, but that’s a hard task, if not an impossible one. The culprit is something statisticians call collinearity.

Here’s the problem. You can find some instances where the pitcher has thrown 80-100 pitches but is facing a batter for the first time (a pinch hitter), but there won’t ever be data telling us what happens when the pitcher has delivered 0-20 pitches in a game and is facing the batter for the third time. Times through the lineup and pitch count are highly correlated variables, and when they’re simultaneously evaluated in a model, strange things happen: the numbers generated by such a model would hint at pitch count being negatively correlated to run scoring or, stating it differently, that the more pitches a pitcher throws in a game, the more untouchable he becomes, which does not make sense.

Thus, for these two factors, we are left more or less where we started. We are still confident that the longer a pitcher stays in a game, the more likely the offense is to produce runs against him. The pitcher’s fatigue and the batter’s familiarity with his weapons are the reason for that, but we cannot tell how much each component increases offensive output.

Notes on the model
I approached this backwards, as the reader is usually introduced to the model before the results. The model I built to get the numbers I presented is a cross-classified linear mixed model, which I have talked about in past articles. Since this is not the place to discuss advanced techniques, I hope statisticians won’t get too upset if I describe it as a sophisticated with-or-without-you model.

The identities of the pitcher, the batter, and the ballpark constitute the so-called random effects of the model—in other word, the WOWY part. The remaining variables I exposed throughout the article enter the model as fixed effects, which is to say that they are treated like they would be in a simple linear regression.

What should we do with all of the above?
All the work I presented here might be considered an exercise to outline the forces that are in play during an at-bat that make the outcome more likely to be offense friendly or defense friendly. When doing an analysis, it’s easy to forget to take into account some of the quantities analyzed here. For example, when I wrote about reliever-to-starter conversions, Tom Tango pointed out that I’d neglected the bottom-of-the-ninth and time-of-day effects. Even in this article, I’ve left out at least two factors: the home plate umpire and the catcher.

The bottom line is that there are many pitfalls awaiting the inattentive analyst. Selection biases are always just around the corner, lurking variables can confound results, and collinearity can make a model spit out strange numbers. Remember to practice smart sabermetrics.

Related Content: Park Effects, Run-scoring, Home Field Advantage, Collinearity, Weather, Temperature

5 comments have been left for this article.