May 24, 2011
Answers from a Sabermetrician, Part 1
Believe it or not, most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.
You asked, he answered. Below are the first batch of responses to the questions BP readers submitted for sabermetrician Tom Tango. All questions are presented in their original form.
TOPIC #1: Lineups
Peter Hood asks:
This is a sabermetric myth.
There are two issues to consider:
In The Book, we looked at this topic. Luckily for you, it was excerpted a few years back at The Hardball Times (please read that). The first takeaway is that yes, definitely the players respond differently. And really, when you are talking about human beings in different situations, the expectation is that they should respond differently. After all, they are not automatons, are they? And they respond on the surface as you'd expect: the pitcher is avoiding the unprotected batter, which results in more walks (and more strikeouts).
So, score a big one for conventional wisdom.
But, even though there is a different response pattern by the players involved, that does not by itself mean that it favors one side or the other. Indeed, the result of our study shows that when it comes to putting the ball in play, there was no significant impact.
So, score a wake-up call for conventional wisdom.
Subtopic: Batting order
The Book proposes that you put your top three batters somewhere in the first, second, and fourth slots, with the low-power guy in the leadoff slot and the high-power guy in the cleanup slot. It also proposes that you put your two next-best batters somewhere in the third and fifth slots, with the high-power guy in the third slot. Those are not hard and fast rules, because you also need to consider the propensity of each batter to ground into double plays, the speed of the runners, the handedness of the batters, and so on.
Now, as to the main reason the third hitter is not so highly thought of: he comes to bat a disproportionate number of times with the bases empty and two outs. When that happens, the best way to score a run is to hit a home run. I've run models where I swap the traditional number-three hitter with the traditional number-two hitter, and you end up scoring more runs by making the swap. But we're only talking about a two-run gain over the course of 162 games.
Even doing something drastically incompetent, like putting the pitcher in the cleanup slot, costs you only 0.1 runs per game.
As for why it’s better to bat pitcher eighth: it's because it's more beneficial to set up the top of the order than to give the pitcher fewer times to bat. But again, we're talking about a two-run gain over the course of 162 games.
Why is there so little gain (or at least, less than one might presume)? Because everyone eventually bats. It's like deferring your taxes: you can save only so much. If you swap your number-two and your number-six hitters, what happens? Well, that's a difference of 72 plate appearances. If your number-six hitter creates 90 runs per 700 PA and your number-two hitter creates 70 per 700 PA, the net effect is that you can gain 20 runs per 700 PA. And 20 divided by 700 times 72 is two runs.
The best way to set up your batting order is to put it in the optimal order (which means you have to have different batting lineups based on pitcher handedness), and then tweak it based on the ego of the players, because human impact is more important than leveraging two runs.
TOPIC #2: Pitchers
Subtopic: Bullpen management
There is one thing that has been constant between bullpen usage in the 1970s and today: the best relievers finish the close games. The question at hand is really when you can bring them in. If you focus on the idea that you can bring in the ace reliever in the seventh or eighth inning and have them not pitch in the 9th inning, you are going to be in unchartered waters.
I made a little chart a few years ago that showed when Sutter and Gossage were brought in (please read that). We see that only half the batters they faced were in the ninth inning. About 40 percent of the batters they faced were in the eighth inning or earlier. And the leverage, or the importance, of those batters was just as strong in the ninth inning as it was in the earlier innings. So, we can definitely find situations, very easily, to bring those relievers in earlier and have them pitch to the end of the game.
Against that, however, we have to weigh two things:
So, while I will come down somewhere on the side of the new wisdom, I think there's still plenty to be said for conventional wisdom and sabermetric wisdom, as well as future wisdom. There's much more to learn here.
Subtopic: Starter-Bullpen Management
This is a good question. You are basically asking what the value is of making sure your bullpen stays as deep as possible. So, if you have an effective .490 winning percentage starter and your reliever is a .495 winning percentage pitcher, should you necessarily bring in the reliever? If you do, you now deplete your bullpen by one bullet.
I don't know that I have a good answer here. In the current practice, however, the decisions being made are not this close. It's more like a .440 winning percentage starter or a .495 winning percentage reliever. So, we're not close to the point of thinking that perhaps we should be saving our bullpen a little longer. A pitcher pitches worse each time he faces the same batter in the same game, which is a key finding in The Book.
Subtopic: Closer usage
In The Book, I noted that almost 20 percent of batters faced by the ace relievers were in low-leverage (tune-up) situations. Basically, managers have a hard time knowing when to hold back the ace and when to bring him in when it actually counts. Continually waiting for a more and more important scenario that does not materialize will force a skipper to throw him out there when it doesn't matter.
The main problem is the over-reliance on using the ace in the ninth inning of a three-run save situation, and the under-reliance on using him in the eighth inning with runners on base. This was also excerpted, at Sports Illustrated (please read that).
This is the sixth answer I just wrote, and I apologize for the constant mentions of studies in The Book. I'm not pushing The Book, and I encourage people to read it for free via Amazon's Look Inside feature, where substantial portions are available. You can get to that Amazon page via my blog.
Ah, the old correlation problem. There is so much noise in a catcher's ERA that to try to isolate his specific skill in it is extremely problematic. You have to go in with the idea that of course a catcher will impact his pitcher. If you can't prove it, it just means that you haven't been able to find that needle because of the haystack problem. There were a couple of good studies in the recent Hardball Times Annuals on this topic, and I think more can still be uncovered.
TOPIC #3: DIPS
Subtopic: Observed spreads
I think the best discussion regarding DIPS took place over at my blog several years back, which is captured in this pdf document (please read that). The spread of singles, extra base hits, and home runs are all in the same ballpark.
Subtopic: Pitcher Control
Pitchers absolutely do have control over balls in play. The question on the table is the degree to which we can find this skill, given the amount of noise in the BABIP metric. We can very easily figure out if a pitcher is a groundball pitcher or not, about as easily as figuring out if someone is a strikeout pitcher or not.
If the choice is: should I put 100 percent weight on BABIP or 0 percent weight, then the answer is 0 percent if you have fewer than 150 starts, and 100 percent if you have more than 150 starts (more or less). Fortunately, we don't have to limit ourselves like that. It's like a pitcher's won-loss record: if you look at a single-season won-loss record and you have to decide whether to give it equal weight to all the other stats or no weight at all, choose no weight at all. But at the career level, you'd have to give it significant weight. Therefore, the correct way to use BABIP (or any metric, for that matter) is to give it a sliding scale of a weight, based on how much performance data you happen to have.
Subtopic: Hitter Control
With hitters, the number of balls in play required for the metric to remove a good portion of the noise is much smaller than for pitchers. That's why we're happy with a season's worth of BABIP for hitters, but not for pitchers.
The reason for this is that among pitchers, you need to have a skill not to allow hits on balls in play. So, when looking at MLB pitchers, it's hard to separate their talent in this respect, because they are all good at it. It's like goalies in hockey: their save percentages are so close because they are all good. But for hitters, there are many different ways to contribute: high walks, high HR, great fielding. Consequently, our ability to distinguish between good and bad BABIP hitters is greater. Go look at BABIP among high school pitchers: you will see that there's a far wider range there, which makes it easier to see who the good pitchers actually are (or at least to be able to discern within a team which is the better pitcher).
I agree. Our job is to figure out how much random variation there is in a metric, relative to the number of trials (plate appearances, or balls in play, etc). What we are up against is that we link the performance to the pitcher's identity, when really, much of what happens should be linked to other factors. We say things like "the pitcher allowed a hit," when really what happened is "the pitcher was on the mound when his team allowed a hit." In this way, we are suggesting that the pitcher, the fielder, the park, the batter, etc, all played their part in that hit occurring (and luck, too, of course).
Subtopic: Home runs
But that's not true. If we were told that a ball was hit to the warning track, or that a ball was a pop-up 200 feet in the air and 150 feet from home plate, we would not treat them the same. But we are not told that in a pitcher's seasonal line. We are presuming that a pitcher doesn't have a disproportionate number of warning track doubles and infield flies when we look at seasonal lines. It’s a workable enough presumption, for the most part.
What you have to understand is that we're trying to categorize a pitcher's 600 contacted balls in a season into three categories: home runs, other hits, and outs. What DIPS is arguing is that you are (almost) better off dividing those contacted balls into just two categories: home runs and all other batted balls. The difference between a warning track hit and a warning track out, as it relates to the talent of the pitcher, is very small. Now, you can argue for other ways to classify batted balls: for instance, balls hit 300 feet and balls hit fewer than 300 feet, or balls that remain in the air for under three seconds and balls that remain in the air for more than three seconds. Each of those buckets will have much different run values and talent levels associated with them. It's a question of making up categories for balls, given whatever factual and (if any) subjective data we have for each batted ball.