BP Comment Quick Links
![]() | |
November 5, 2013 Baseball ProGUESTusEverything You Always Wanted to Know About the Times Through the Order PenaltyMost of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers, and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line. If, like many of us, you’re a prolific baseball blog reader, you’ve probably heard a lot lately about the “times through the order” penalty (TTOP). For those of you who have no idea what that is, here is a quote from page 187 of The Book: Playing the Percentages in Baseball: “As the game goes on, the hitter has a progressively greater advantage over the starting pitcher.” Essentially, the more times a batter faces a pitcher during a game, the better he does at the plate. The way the TTOP is traditionally measured is by looking at a starting pitcher’s performance using, say, wOBA against, the first time through the batting order, the second time, and so on. (Like TAv, wOBA is an all-in-one offensive rate statistic, but on the OBP scale instead of the BA scale.) Theoretically, a starter’s wOBA should be about the same for batters 1-9, and then 10-18, etc., since the pitcher is obviously the same, and in most cases the batters are more or less the same (I don’t include pitchers batting or pinch hitters). You might even think that a pitcher improves as the game goes on, as he gets thoroughly warmed up—especially on a cold night—and gets a feel for all of his pitches, at least until he perhaps enters a decline phase due to fatigue, assuming he is allowed to stay in the game that long. But that’s not what we see, as the last letter of the acronym TTOP implies. Here are some actual numbers from The Book (p. 186, Table 81.) based on data from 1999-2002. The total sample is 469,721 PA between starting pitchers and starting lineups, not including IBB and bunts.
As you can see, there is a significant and distinctive trend in the last column, at least through the third time through the order. Basically, batters get better and better from the first time facing a pitcher in a game to the second, and then again to the third, and then revert back to “second time” levels by the time they have seen the pitcher for the fourth time. We’ll talk about that “fourth time” anomaly in a little while. Another thing you can clearly see is that most pitchers make it through the order at least three times, which is actually something of a modern trend. In the past, starting pitchers pitched many more complete games, but they were also taken out earlier when they were getting shelled. It is also relatively rare for a pitcher today to face the order for the fourth time. That should not be surprising, since by the fourth trip through the lineup, pitch counts are usually elevated. On average, it takes almost 100 pitches to get through the order exactly three times (the current average “pitches per PA” (P/PA) is around 3.8). As you might expect, the pool of pitchers is not exactly the same for each TTO group, at least starting with the third time (and neither is the pool of batters). Pitchers in group three are slightly better than those in groups one and two, and the pitchers in group four are quite a bit better. Balancing this out is the fact that the quality of the batters in each group also rises slightly. Because of the disparity between the pitcher and batter pools in each group, the expected wOBA in each group is actually a little different, as you can see from the table below.
The significant rise in observed wOBA from the first through the third times through the order is not a result of any large changes in the pitcher and batter pools in each group. For all intents and purposes, the expected wOBA is the same in all groups. Something else must be going on. If you are wondering which group represents a pitcher’s norm, conveniently, the second time through the order is almost exactly what we would expect from the pitcher overall. That is illustrated in columns 4 and 5 in row 2 of the table above. In the second time through the order, the expected wOBA, based on the pitchers’ and batters’ overall full-season numbers, is .353, and the observed wOBA is .354, almost exactly the same. In summary, we can say this: The first time facing the lineup, the starting pitcher has the advantage, as compared to his overall “true talent.” The second time, the battle between the pitcher and batter is roughly neutral. The third time through the order, the batter gains the advantage. The fourth time, the balance appears to be neutral again; however that may not be quite true, as we will see in a while. Now that we’ve gotten the groundwork out of the way, let’s look at some interesting data and ask and answer some equally interesting questions. All data is now from 2000-2012. Again, pitchers batting and pinch hitters are not included. First, we’ll look at the same data that we presented in The Book, but for 2000-2012.
We basically see the same pattern that we found in The Book—around an 8-10-point increase each time through the order until the fourth (and later), at which point it levels out. The observed wOBA is a little higher than in The Book across all TTO groups because of the way it is calculated (no sacrifice hits—in The Book we removed all bunts). The pitcher and batter quality numbers do not have SH removed—which is why they are lower as well. Now let’s focus in on the first inning. While the first inning usually contains only batters who are facing the starter for the first time, some crazy stuff is going on that we don’t see in the second or third innings when also facing the order for the first time. It has nothing to do with the quality of the batters faced. All the observed wOBA numbers you will see from now on (as well as in the previous table) are adjusted for the quality of the batters and pitchers faced.
There seems to be something about the first inning that gives the pitcher an eight-point wOBA advantage as compared to the first time through the order in the second or third inning. Again, we might have assumed the opposite—that hitters should have the advantage, as pitchers need some more time to acclimate themselves to the mound, find out which pitches are working for them, etc. On the other hand, hitters haven’t seen any real pitching since their last game, they may have been sitting on the bench for some time, and they probably haven’t seen that particular pitcher for a while, if ever. What happens if we split the above sample into home and away?
The first time through the order, the home team has only a four-point hitting disadvantage in the first inning, as opposed to the second or third inning, but the road team hits a whopping 14 points worse! Your guess as to why there is such a large discrepancy between the home and road team in the first inning is as good as mine. Maybe coming to the plate before playing the field is a disadvantage for the visiting hitters, similar to the DH or PH penalty. Maybe it takes the visiting starter or even the fielders more time to get used to the mound and the playing field (although the data suggests that it is a hitting problem and not a defensive one). What’s clear, however, is that the home field advantage is extremely large in the first inning, larger than in any other inning by a long shot. What about by the second time through the order? Has this imbalance between the home and road teams disappeared or at least dissipated? Let’s look at all the TTO data split by home and road pitcher.
It does appear that by the time we get to the second time through the order, the imbalance is mostly gone. The difference between the home and road wOBA the first time through the order is 18 points. The second, third, and fourth times through the order, the differences are all around nine points. One of the things to take out of this is that the home team starting pitcher derives a large portion of his home field advantage from pitching in the first inning. Relievers are not so fortunate. If you’re a pitcher and you want to pump up your stats, start all your games at home, and after you’ve faced nine batters, get the heck out of Dodge! Let’s briefly get back to that funky fourth time through the order, when it seems that the TTOP stops dead in its tracks. Does the batter’s advantage level off by the time he’s seen the pitcher for the fourth time? Actually, not as much as it appears. A while ago I stumbled on something interesting about what happens when a starter lasts into the ninth inning or later. The starter’s team is probably winning, of course, but the margin of victory also tends to be large. In other words, in the very late innings, if it is a one- or two-run game—or even tied—the closer or other short reliever is likely to be on the mound rather than the starter. And when the game is not close, especially in a blowout, for some reason wOBA does not do well in reflecting the losing team’s approach at the plate. Consequently, wOBA in the ninth inning or later, with a starter in the game, is artificially low. If we remove the ninth inning and later from the “fourth time through the order” data, we see the wOBA rise accordingly. The other thing that is relevant is the temperature of the game when the lineup bats for the fourth time. In night games it is much colder, and most major league games are played at night. Let’s look at the regular TTO numbers, but this time we’ll do two things: One, we’ll include only up to the eighth inning, and two, we’ll split the data into three groups: outdoor day and night games, and indoor games.
Eliminating the ninth inning and later raises the wOBA the fourth time through the order by two points in all games combined. And as you can also see, in day games it rises a little more, while it stays flat in night games. In the indoor games, where temperature is not a factor, we actually see a fairly large increase from the third to the fourth times through the order—eight points. In day games, we see only a three-point jump. Maybe in the daytime the temperature decreases a little between the third and fourth times, or maybe the batters and umpires are tired and want to go home. Again, your guess is as good as mine in explaining the above patterns. Suffice it to say that once weather is removed, as well as the ninth inning and later, we do in fact see a steady TTOP all the way through to the fourth or later time through the order. What about the quality of the pitcher? Does that affect the penalty? Are good pitchers good at least partly because they don’t suffer as extreme a penalty, and vice versa for bad pitchers?
Interestingly, the really good pitchers show a fairly modest penalty from the first to the second time through the order—eight points—while the bad pitchers pitch 11 points worse. However, from the second to the third time, the aces get 12 points worse and the poor pitchers, 10. These differences could easily be due to sampling error. In any case, it is clear that great pitchers are by no means immune to the dreaded TTOP. These are starters who are elite pitchers, on the average a run per nine innings better than the typical pitcher, yet by the time they face the lineup for the fourth time, they are barely .3 runs per nine above average. By the third go-around, both groups of starting pitchers, the aces and the duds, both lose about 20 points in wOBA as compared to their first go-around, and around 10-12 points as compared to their overall numbers. During the fifth game of the World Series, several people wondered whether Jon Lester would not suffer from the typical TTOP. They used that speculation to partially defend John Farrell’s decision to let Lester hit in the top of the seventh inning and continue to pitch in the bottom of the seventh, even though he was facing the Cardinals lineup for the third time. By that time, if the TTOP was in effect, we would have expected Lester to be a slightly above-average starter rather than the roughly no. 2 starter that he normally is (notwithstanding any potential “hot hand” effects resulting from pitching a good game so far). The third time through the order, the typical penalty is around .35 runs per 9 innings compared to a starter’s overall RA9. The evidence that the Farrell defenders gave for Lester possibly being immune to the penalty was that in his career he has not shown the typical TTOP. I looked at 2009-2012 (I don’t have the 2013 data handy), and here is what I found for Lester.
We are not dealing with tremendously large sample sizes in each group, of course, so we don’t expect these numbers to be especially reliable, and it is unlikely that they would exactly mimic the pattern of the average starting pitcher. That said, Lester does show a roughly typical penalty from the first to the second time, no penalty from the second to the third, and an exceedingly large jump from the third to the fourth (the number of TBF in the fourth group is only around 165). However, before we can put any stock in the predictive nature of a player’s own patterns or deviations from the league average, we must estimate how much to regress that data toward the league mean—the typical TTO penalties. That’s the same thing we do for platoon splits, BABIP, or even overall performance itself, like FIP, ERA, or wOBA against, when creating projections or estimating true talent. As it turns out, a pitcher’s past deviations from the league average, in terms of their TTO penalties from the first to the fourth times through the lineup, are not very predictive, much like BABIP. When I computed year-to-year correlations for all pitchers with at least 100 TBF in each “times through the order” group per season (an average of around 220 TBF per group), I got “r” values of around .03 for around 500 data points. That means that it would take around 7,100 TBF or 1,650 innings pitched (roughly eight seasons for a full-time starter) before we would regress a pitcher’s own TTOP pattern 50 percent toward that of the average starter. So unless a pitcher had a long history of a significantly larger or smaller TTOP than the average starting pitcher, we can assume that he will lose around .35 runs per nine innings the third time through the order. Keep in mind that because of the relatively small samples we are dealing with, the 95 percent confidence interval around the .03 correlation is roughly -.06 to .12. I’m going to look at one more thing, and then I think you can truly say that you know everything about the now-famous (I hope) “times through the order” penalty. In that same World Series game, there was also some talk about the fact that Lester had thrown only 69 pitches after facing the lineup exactly twice, so maybe he wouldn’t suffer any third-time penalty—another attempt to justify Farrell’s decision to leave him in the game. After all, most starting pitchers won’t be fatigued after only 69 pitches. While that is true, the TTOP is not about fatigue. It is about familiarity. The more a batter sees a pitcher’s delivery and repertoire, the more likely he is to be successful against him. In fact, 69 pitches is not even a low number when it comes to facing the leadoff hitter for the third time. It takes an average starter about 68.4 pitches to get through the order two times (18 times 3.8, the average P/PA in MLB). That said, even though fatigue due to elevated pitch counts is likely not much of a factor in the TTOP, the more pitches a pitcher throws each time through the order, the more the opposing batters are able to acquaint themselves with the pitcher. How much does that affect the penalty? I looked at that in two ways: First, I looked at the number of pitches thrown going into the second, third, and fourth times through the order. I split that up into two groups—a low pitch count and a high pitch count. Here are those results. The numbers in parentheses are the average number of pitches thrown going into that “time through the order.”
We don’t see much difference there. In general, number of pitches thrown does not seem to be a factor in determining how much of a penalty a starter is going to suffer each time through the order. The second, and better, way I examined this question was this: I looked only at individual batters in each group who had seen few or many pitches in their prior PA. For example, I looked at batters in their second time through the order who had seen fewer than three pitches in their first PA, and also batters who saw more than four pitches in their first PA. Those were my two groups. I did the same thing for each time through the order. Here are those results. The numbers in parentheses are the average number of pitches seen per PA so far in the game, for every batter in the group.
Wow! If a batter has seen more than four pitches in his first PA, he hits 25 points better the second time around. That is a huge revelation, I think. As with the previous table, batters who’ve seen fewer than two pitches or so during their first PA still benefit by 10 points in their next PA. So the big advantage seems to come from seeing a lot of pitches, especially in the first PA. This advantage seems to disappear by the third time through the order. By this time, the “high pitch” batter has only a two-point advantage over the “low pitch” batter. The second time he has a 15-point advantage. The fourth-time numbers in the “high pitch” group probably suffer from sample size error, as the TBF are only around 3,300. In fact, if we combine the third and fourth times in the “high pitch” group, we still get a wOBA of .360. By the time batters get to the third time through the order, how many pitches they’ve seen is mostly irrelevant. But from the first to the second go-around, it seems to be huge. Batters who are patient are indeed imparting a benefit to their team. But it is not what most people think. It is not in order to drive the starter out of the game early—against most starters, especially the poorer ones, that would actually be a bad thing for the batting team! The benefit is to the batter himself. The more pitches he sees, the better his next PA, at least from the first to the second time through the order. Let’s recap what we learned today about the “times through the order” penalty.
As you can see, the “times through the order” penalty is a significant effect that should be incorporated into a manager’s decision about when to remove a starting pitcher. In fact, it would behoove managers and pitching coaches to be much more mindful of a starter’s “times through the order” than his pitch count. In an article I wrote two years ago about the benefit of “quick hooks,” I showed that a typical NL team could add from a half to a full win per season simply by removing a starting pitcher who is not an ace whenever he comes to bat in a high-leverage situation after pitching at least five innings, even if his replacement is a league-average reliever. Even in AL parks, where pitchers don’t bat, managers should be inclined to replace a pitcher, especially a fourth or fifth starter, as soon as he faces the order for the third time. These mediocre or worse starters are likely at or near replacement level by this time, even if they have been pitching well. If you are watching a game and feel inclined to criticize or (less likely) praise your favorite manager, make sure that you don’t forget to consider everything you just learned about the “times through the order” penalty.
|
While I don't disagree that in most cases times through the order will lead to diminishing returns, I'm concerned about taking as an absolute principle. This reminds me of early DIPS theory a bit in the sense that we're taking it as a bright line rule about pitcher performance when I suspect the reality on an individual scale is much different than the reality across all pitchers. It's no longer seriously controverted that some pitchers are better at inducing weak or inefficient contact (see Cain, Ford, Glavine, etc) thus 'defying' DIPS and I wouldn't at all be surprised to see that there are times through the order defiers as well. Anecdotal evidence suggests that a guy like Verlander often gets stronger as the game goes on. Further, I suspect that pitchers who have more MLB caliber pitchers in their repertoire, who are smarter or who have a better catcher helping call a game for them would show a less pronounced times through the order effect than a two pitch sort 1.0 WAR sort.
I think, as well, this understates the risks of overtaxing a bullpen and the stacked effect that can build long term. The analysis feels more relevant to the playoff context (when relievers can pitch nearly every game thanks to off days) than it does to the regular season context where managers try to avoid using relievers more than two days in a row.
He addresses this rather specifically in the text.
In any case, Verlander's career splits (all this data are easily available on bbref):
1st time through the order: .629 OPS
2nd time through the order: .638
3rd time through the order: .706
4th time through the order: .666, in 1/5th the sample size
"For all intents and purposes, like BABIP, we can ignore a pitcher’s own historical TTOP when projecting any future penalties. In other words, we do expect Lester, or any other pitcher, to lose around .35 runs per game the third time through the order, regardless of what he has done in the past."
This is exactly the language I object to here. I don't agree that we can ignore all past production and pretend that every single pitcher is applicable to the same curve that overarchingly reflects pitchers. I say this as a sabr-loving, BP-subscribing, baseball nerd, but this to me feels strongly of ignoring the less tangible elements of pitching in favor of a broad, potentially more satisfying, conclusion. This feels just like the early days of DIPS - DIPS certainly has value and is overarchingly accurate, but can struggle on an individual level - it seems too extreme for me to accept as gospel anything that creates a hard and fast bright line rule that ignores any less mathematical analysis of player performance or game theory.
Oh, and one more thing. I don't think that DIPS is like the TTOP. I don't think the pitcher has much control over the TTOP. I don't think it has much to do with him. I think it is almost entirely about batters simply getting used to the pitcher, and I don't think that a pitcher can do much about it. That makes it very different from DIPS.
In any case, your "opinion" doesn't matter. The math speaks for itself. If all you know is a pitcher's past times through the order numbers, the math tells us that we can't use that to predict the future. If you want to argue with the math, be my guest.
Given that the data seems to support the idea that this is due to increased familiarity with a pitcher's repertoire, would it be possible to look at the variety of stuff thrown by a pitcher? It would seem that pitchers like Darvish, who throw a lot of different pitches, might have a lower penalty as they go through the order multiple times.
That is certainly possible. Joe's data below suggests that that might be true. It is on my list of things to do!
To act as though a manager should make every decision predicated on the basis of a basic guarantee that a pitcher will be 0.35 runs worse the third time through the order and that no other factors (in a single game window) including pitcher quality, catcher quality, lineup quality, temperature, type of weather, home or road game or actual pitcher performance in a given game is simply too far. You're operating as though each pitcher performs at a baseline talent level every single game. The Verlander numbers Schere cites above are a perfect example - in a game when Verlander is 'on' (one where he's more likely to see the lineup a 4th time) he improves late in the game relative to a normal start's 3rd time through the line-up. There are countless examples of little things that would operate to impact pitcher performance on a game-to-game basis - the 'math' doesn't speak for itself. I mean fundamentally this 0.35 runs number takes as an objective truth that every single starting pitcher in MLB tires at exactly the same rate - a less fit pitcher should perform worse the third time through and a more fit one should perform better, it's simple human biology.
Yes, I did address this with the correlations. Like DIPS, there may be pitchers who have their own unique "times penalties" (or not), but we can't tell from their past results, even for many seasons. That is what a very low correlation (.03) tells us, be definition - that a pitcher's past differences (between times through the order) has almost no predictive value. So you can't contradict that when the math is almost irrefutable.
" think, as well, this understates the risks of overtaxing a bullpen and the stacked effect that can build long term."
Well, I'm not advocating anything. I'm simply giving and explaining the data. What a manager wants to do with that is up to him - not me. But I think that it would behoove managers to understand this phenomena in order to make those decisions, don't you?
The main problem I have with this article is that you assume that past TTOP is the only information one can use to predict future TTOP. At least that is what is sounds like when you proclaim, "We can assume that all starting pitchers have roughly the same “true talent” TTOP". No, we can't make this assumption. All we can state is that in this specific case, if one is limited to a certain data set to project a certain skill, that data set can be largely ignored. It does not say anything about whether there can be individual pitchers with truly abnormal, sustainable TTOP, or anything about how one might identify them. There are only 150 starting pitchers at any given time. There is no need to save data processing time by adopting generalizations, when plenty of intellectual throughput exists to evaluate each player individually. The same goes for DIPS, platoon splits, home splits, etc.
I was thinking the same thing about there surely being outliers, whether it is pitchers with more pitches or pitchers who work with certain catchers that are better at strategically unveiling their talents at ideal moments (could this be another 50 runs Jose Molina was secretly worth?), but my takeaway was that even if there are outliers, you will never be able to accurately identify who they may be. A manager is best to utilize the aggregate data then try to slice the data too finely.
And these principles need to be integrated into the managers decisions (and plans). You obviously can't just remove all of your starters after two times through the order and stack another 350 innings on your bullpen (except in the playoffs), but there are great opportunities to optimize when you use short hooks (day games and in domes) and when you let your starter ride (night games early in the season).