January 23, 2001
Pitching and Defense
How Much Control Do Hurlers Have?
by Voros McCracken
"You're insane." That's generally the response I get when I
present the information you're about to read. I've been accused of being
the "epitome of 'pseudo-stat fan' gibberish." I've even been
accused of being Aaron Sele writing under a pseudonym. I'm not
entirely sure why my little way of doing things stirs the emotions of
people to such a large extent, but apparently it does.
My belief? Well, simply that hits allowed are not a particularly meaningful
statistic in the evaluation of pitchers.
Now before you accuse me of being Aaron Sele, please bear with me for a few
paragraphs as I explain how I reached this point, and where it led from there.
One of the basic issues in evaluating pitchers is to what extent the
defense behind them is responsible for the results. In fact, in Baseball
Prospectus 2000, one of Keith Woolner's "Hilbert Problems"
for baseball was the issue of separating defense and pitching. As he put
it, "Pitching and fielding are so intertwined that they seem
impossible to separate."
Around the end of the 1999 season, I started to think about that problem. My
plan was to go about dividing a pitcher's stat line into what the defense
can't affect and what it's possible that it can:
Defense Independent:
Walks, Strikeouts, Home Runs (essentially), Hit Batsmen, Intentional Walks
Defense Dependent:
Wins, Losses, Innings, Runs, Earned Runs, Hits Allowed, Sacrifice
Hits, Sacrifice Flies.
Any stats derived from the defense-dependent ones like OPS against or ERA
or would also be defense dependent.
The idea was to express the things the defense can't affect in one area and
check the results, then check those areas where it's possible the defense
can have an effect and analyze how much of the performance is pitching and
how much is defense.
The first thing I did was create something called "Defense Independent
Pitching Stats." DIPS are the representation of a pitcher's stat line
without any possible influence from the defense behind the pitcher. I
calculated the various rates for walks, strikeouts, home runs, hit batsmen,
etc. as a function of batters faced, and inserted them into the pitcher's
line. Then I calculated how many batters faced were remaining, and assigned
league-average rates for all of the other component stats: innings, hits,
doubles, triples, etc. So for all the stats that it was possible that the
defense could affect, every pitcher was now on equal footing. The results,
using Dave Burba's line in 2000, looked something like this:
Actual
BFP IP H HR ER BB SO ERA
848 191 199 19 95 91 180 4.48
Defense Independent
BFP IP H HR ER BB SO ERA
848 195 185 18 89 93 179 4.13
As you can see, the home runs, walks, and strikeouts changed little (they
changed at all only due to park effects and a few other minor factors). But
hits and innings pitched changed by a decent amount, at least in this case.
The next step was to look at the rest of a pitcher's stat line and somehow
divine how much of it was the result of the pitcher's work. To do this, I
looked at the range of values for Defense Independent ERA and compared how
close they were to the range of values of actual ERA. For example, if the
range of Defense Independent ERA was between 4.00 and 5.00, it would be a
good indication that there's a lot about pitching not covered in the stat,
because ERAs have a much larger range than that.
That didn't happen. The range was virtually the same as actual ERA, with
the best pitchers having DIPS ERAs near 2.40 and the worst having DIPS ERAs
up near 7.00. I found this surprising, as I expected the range to narrow
quite a bit more than that.
Then, I looked at the behavior of Hits Per Balls in Play
[(H-HR)/(BFP-HR-BB-SO-HB)]. That's where the trouble really started. I
swear to you that I did everything within my power to come to a different
conclusion than the one I did. I ran every test, checked every stat,
divided this by that and multiplied one thing by another. Whatever I did,
it kept leading back to the same conclusion:
There is little if any difference among major-league pitchers in their
ability to prevent hits on balls hit in the field of play.
It is a controversial statement, one that counters a significant portion of
110 years of pitcher evaluation. Let's go over the facts that led me to
this conclusion:
- As we discussed, the range of ERAs for pitchers is almost as large
without defense-dependent statistics as it is with them. This speaks to the
fact that there can be massive differences in the ability of pitchers
before even considering the impact of defense.
- The pitchers who are the best at preventing hits on balls in play one
year are often the worst at it the next. In 1998, Greg Maddux had
one of the best rates in baseball, then in 1999 he had one of the worst. In
2000, he had one of the better ones again. In 1999, Pedro Martinez
had one of the worst; in 2000, he had the best. This happens a lot.
- There is little correlation between what a pitcher does one year in the
stat and what he will do the next. In other words, what Eric
Milton's hits per balls in play was in 2000 tells us next to nothing
about what it will be in 2001. This is not true in the other significant
stats (walks, strikeouts, home runs). Walks and strikeouts correlate very
well and homers correlate somewhat well.
This is a crucial fact. One of the more critical aspects of statistical
analysis is determining how well a statistic reflects an ability. It's the
test given to clutch hitting, catcher game-calling, pitcher won/loss
records, and so on. One of the first things asked when addressing this is
"Does the stat correlate well with itself from year to year?" One
reason clutch hitting is questioned is that the "clutch hitters"
change from year to year, which indicates that it probably isn't the hitter
as much as it's other factors. The answer to whether hits per balls in play
correlates well from year to year is a fairly solid "no."
- You can better predict a pitcher's hits per balls in play from the rate
of the rest of the pitcher's team than from the pitcher's own rate. This is
pretty self-explanatory. The effects of having the same team defense and
home park appear to be significant determinants in creating what little
correlation there is in the stat.
- Take pitchers with similar stats in every other component category (and
other peripheral factors like age, throwing hand, team hits per balls in
play rates, etc.) but large differences in hits allowed (and therefore in
innings pitched). When you group the pitchers into two
categories--high-hits and low-hits--the following year the high-hits
pitchers do not give up significantly more hits per balls in play (.292 to
.291) than the low-hits pitchers, and the groups have identical ERAs.
This is a difficult point to overcome if you want to
show that preventing hits per balls in play is a significant ability of
pitchers. If, when all other things are equal, there is no difference, the
conclusion becomes clearer.
- Similarly, if you take pitchers with comparable stats in every other
component category, but have as large as possible a difference in
strikeouts, then separate the pitchers into high-strikeout and
low-strikeout categories, the high-strikeout pitchers continue to strike
out more hitters, while also giving up far fewer hits and having
significantly lower ERAs.
This is the natural opposite of the fifth point. If number five is true,
then logically number six ought to be true as well. It is.
- The range of career rates of hits per balls in play for pitchers with a
significant number of innings is about the same as the range you would
expect from random chance. This is true even though we know that some
pitchers may have had consistent advantages over others, as these rates are
unadjusted for park or league. The vast majority of pitchers who have
pitched significant innings have career rates between .280 and .290.
- When you adjust for environmental advantages (the DH, park effects, and
so on) the range becomes even smaller. The leaders in this stat (Pete
Harnisch) have had significant environmental advantages while most of
the trailers (yup, Aaron Sele) have had disadvantages. After these
adjustments, the range is well within the realm you could expect from
chance alone.
- A stat like Component ERA (or any similarly stat that calculates ERA
from the rest of a pitcher's performance), while correlating better with
next-year ERA than ERA itself, does not correlate nearly as well with
next-year ERA as it does if you perform the same calculation while using
the average hits-allowed rate of the team for which he pitched. This
advantage of "team average" rate grows to rather large
proportions as the number of innings pitched in the season shrinks more and
more.
Two key points here: one, there doesn't appear to be any "hidden
quality" aspect to the stat. The numbers come out as they should if
the above are all true: you can better predict ERA without hits allowed
than you can with them. The other key point is that using a reliever's hit
rate seems to be an extremely suspect way of evaluating relievers. One of
my favorite examples of this is Bobby Ayala in 1998 and 1999.
There are a few lesser and somewhat anecdotal points to be made that, while
not critical, are nonetheless good concepts to understand:
- People have a hard time diagnosing who the pitchers are that are very
good at preventing hits on balls in play. You'll often hear people use
names like Randy Johnson, Jamie Moyer and Andy
Pettitte in protest of the concept, but by any definition you want to
use, these guys are not particularly good in the stat.
- Pitchers like Pedro Martinez and Greg Maddux have, at times, expressed
thoughts on the matter. Martinez has been quoted as saying that the batter
determines what happens once he hits the ball. Maddux described his
scoreless-inning streak last year as "mostly luck" as hard hit
balls that had been falling in were being caught.
- We only have 38 innings' worth of non-pitchers' pitching (like Brent
Mayne). That's too small a sample on which to draw conclusions, but it
is something to think about that these non-pitchers were not any worse than
regular pitchers in the stat. In fact, they were a good bit better.
- Pitchers are often dubbed as "unpredictable", and hits allowed
is by far the most unpredictable of the component stats. In other words, it
is one of the main culprits of pitcher unpredictability.
- There is no significant cross-correlation. That is, a high number of
home runs allowed doesn't really mean anything in determining how many hits
per balls in play the pitcher will allow. The closest is an inverse
relationship with strikeouts (lots of strikeouts means fewer hits per balls
in play) but that relationship is very weak and could be the result of
unrelated factors. There was no significant hits-per-balls-in-play
advantage found in the strikeout study above.
Many people, after reading these points, think I'm saying that all pitchers
give up the same amount of hits. That's not true, and of course it's not
what I'm saying. Randy Johnson gives up fewer hits than Scott Karl.
That's not because batters hit the ball harder off Karl than Johnson, but
because they hit the ball more often off Karl than Johnson.
Aside from walks, there are two basic outcomes for a pitcher: batter hits
the ball or batter strikes out. With the latter, the result is almost
always an out. With the former, all sorts of things can happen, including a
base hit.
So why is this all true? All I can advance are theories, some that can be
checked out and some that are more difficult to verify. I'll end this
article with a list of some of the more popular ones:
- Scouting. The MLB scouting network is set up to sift through an
enormous pool of potential players to get to the group that might be MLB
pitchers. To do this, they often employ tactics that many might call unfair
in an effort to reduce the pool to a manageable number. So they don't take
guys under 5'10" and every pitcher has to throw a certain speed
fastball and so on. One of these factors may be weeding out a subset of
pitchers for which the theory is not true.
- High talent level. This theory is that there's a certain limit as
to how good you can get at preventing hits on balls in play, and that in
order to even come close to the major leagues you have to have reached
this. This theory often comes up in clutch-hitting discussions.
- Too many variables. This suggests that the ability may or may not
exist, but that the number of variables involved in the outcome of balls in
play are so numerous and so difficult to control for that any ability gets
lost. In other words, the noise completely masks any signal.
- A misunderstanding of how the batter/pitcher dynamic works. Some
people will argue that despite all the numbers, the above can't be true
because it means that a screaming line drive hit into the
right-center-field gap is as likely to be an out as a pop-up to the shortstop.
This point deserves further discussion. One of the critical points of
misunderstanding is the issue of "blame." When a ball gets
crushed into the gap in right-center, some think I'm saying that the
defense deserves the blame, not the pitcher. When I counter with
"Neither the pitcher nor defense is to blame, it's the batter who is
to blame," I lose some people. Consider this example:
When I was a kid, we used to go to the cemetery (this was our playground)
and play a game called Lob-League. The makeup of this game was mostly
offense and some fielding, with little to no pitching effects. The
pitcher's job was to lob the ball over the heart of the plate and let the
batter hit it as hard as he wants.
Now, let's suppose we're playing Lob-League and the pitcher lobs one right
in the batter's wheelhouse, but the batter pops it up to the shortstop. Who
deserves credit for the pop-up? The blame argument would indicate that the
pitcher deserves credit for inducing a pop-up despite the fact that all he
did was lob the ball over the plate. No credit or blame would belong to the
batter who popped up the pitch.
A more relevant MLB example might be the Home Run Derby at the All-Star
festivities. I encourage you to watch next year's contest, or, if you have
it, a videotape of past contests. Watch for batted balls that would clearly
be outs. The pitcher is trying to give up home runs, so does he deserve
credit for a pop-up?
In MLB, a pitch could result in a pop-up or a line drive. It all depends on
what the batter does with it. I think the conventional wisdom on the
dynamic between pitcher and batter may be slightly inaccurate.
The critical thing to understand is that major-league pitchers don't appear
to have the ability to prevent hits on balls in play. There are many
possible reasons why this is the case, and I don't really have a concrete
idea as to why it is.
But the one thing I do know is that it is the case.
Voros McCracken is a student
living in Chicago. He will be writing a weekly column called the
"Baseball Skeptic" in an upcoming webzine.
For more information on DIPS and McCracken's work, check out
his Web page at
http://www.baseballstuff.com/mccracken.