July 8, 1999
Bullpens: The Last Word
Final thoughts on the importance of relief
Now, more than any time in baseball history, games are won and lost in the bullpen. As such, more attention has focused on the importance of a good bullpen as oen significant difference between a playoff team and an underachieving also-ran. Whether it's explaining the Mariners' inability to contend despite fielding two of the 50 greatest players in history, or defining how the Reds are in first place with Steve Avery in the rotation and Dmitri Young riding the bench, the fortunes of a team's bullpen seem to dictate the fortunes of the team as a whole.
We recently published the results of a study that looked at whether a good bullpen could add some sort of synergy to a team's win-loss record above and beyond the runs that they save, and conversely, whether a collection of pitchers throwing AckerCurves and WengerTaters would snatch more defeats from the jaws of victory than the run totals would suggest. In the study, published at ESPN.com, we looked at two sets of teams--those with the best bullpens in their league and those with the worst--and compared the records for those teams with their expected records, as calculated by the Pythagorean Method.
What we found was that teams with good bullpens actually won more games--about 1.3 more, on average--than would be expected from their totals of runs and runs allowed, while teams with bad bullpens won about 1.6 fewer games than expected. This is, we believe, the first time any study has pinpointed a subset of teams which routinely outperform or underperform their Pythagorean projection.
Having established that having a good bullpen is important, and calculated how important a good bullpen is, let's test the conclusion a bit.
A Control Group
Any statistical study worth its salt has to account for bias. You must show that the results of your study are not skewed by hidden factors. The best way to do that is to use a control group.
What we showed in the earlier study was that teams with the best bullpens, defined as the bullpens which allowed the lowest OPS in the late innings of close games (LICG), won more games--on average--than their Pythagorean projection, and teams with the worst bullpens won fewer games. But does that prove that good bullpens lead to overachievement, or is it possible that the study's design was flawed?
For example, is it possible that the reason the teams overachieved was that they simply had good pitching, period, whether in the first inning or in the eighth? That teams with good opponent's OPS tend to play in pitcher's parks, where runs are more valuable? There are dozens of ways in which the design of the study may have unintentionally skewed the results, giving us a conclusion which may not be warranted.
So we designed a control group to see if we could eliminate that bias as much as possible. The original study compared the three best and three worst bullpens in each league, from 1980 to 1998 (except 1981 and 1994), based on OPS by relievers in LICG. Here are those results again:
Record vs. Avg. Pythagorean Record SO WO WU SU Best Bullpens 34 32 25 11 + 1.28 Games Worst Bullpens 8 32 27 35 - 1.57 Games
SO - Strong Overachiever (won 3 or more games more than expected) WO - Weak Overachiever (won fewer than 3 games more than expected) WU - Weak Underachiever (won fewer than 3 games less than expected) SU - Strong Underachiever (won 3 or more games fewer than expected).
To briefly review the results: 66 of the 102 teams with the best bullpens (65%) won more games than projected. For teams with the worst bullpens, 62 out of 102 (61%) won fewer games than prjoected. On average, teams with excellent bullpens won about 1.3 games more than projected, while teams with poor bullpens won about 1.6 games fewer, a swing of almost three games.
To construct a control group, we ranked the same teams by the effectiveness of their starting pitchers, based on the OPS allowed by all starting pitchers used by that team. Mimicking our original design as much as possible, we took the top three and bottom three rotations in each league from the same years, and compared their won-loss record against their expected performance.
Here are those results:
Record vs. Avg. Pythagorean Record SO WO WU SU Best Rotations 23 34 24 21 + 0.04 Games Worst Rotations 24 35 22 20 + 0.11 Games
(One of the teams with a poor rotation, the 1983 Padres, went 81-81 and also scored and allowed the exact same number of runs (653), which is why the second group only contains 101 teams.)
As you can see, while slightly more than half (56%) of the teams with good rotations overachieved, the overall effect was extremely small, just 0.04 games above expectation. And teams with bad rotations had almost an identical breakdown to the teams with the best rotations. In fact, on average the teams with bad starters did slightly better (0.11 games above expectation) than the first group, although the difference is so slight as to be statistically insignificant.
In summary, it appears that the results of the original study are not due to some hidden bias, and indeed appears to be a true relationship: good bullpens do correlate with better-than-expected records.
Effect on One-Run Games
It stands to reason that a team with a great bullpen should be able to prevent runs at the most crucial times and thus win more games than their ratio of runs scored to runs allowed would predict. It would also make sense that those same teams would win more than their fair share of tight games, with a record in one-run ballgames better than would be expected from their overall record. Actually, it stands to reason that these two factors would be tightly correlated for all teams--that a team which wins more than its share of one-run games should end up with more wins overall than would be expected.
It stands to reason, and it stands up to the facts as well. Here is a chart comparing the two (for all teams from 1980 - Present, save 1981 & 1994):
Record vs. Avg. Pythagorean Record SO WO WU SU SO 30 20 5 1 + 3.09 Games One-Run Record WO 54 64 29 19 + 1.35 Games vs. WU 17 35 46 44 - 1.17 Games Overall Record SU 2 10 21 35 - 3.53 Games
Some explanation is needed on what we are calling a team's "expected" record. As before, for "Record vs. Pythagorean Record", we are comparing a team's overall won-loss record with their record as predicted by the formula (Runs Scored^2) / (Runs Scored^2+ Runs Allowed^2). The use of 2 as the exponent for this formula is traditional, but in fact is not the most accurate number to use. Previous studies have shown that the most accurate value for the exponent is about 1.87 (see a new study by Clay Davenport for more on this), but the difference is quite small, and for purposes of this study we'll use 2 as our exponent to keep this study in line with the previous one at ESPN.com.
In comparing "One-Run Record vs. Overall Record", it is important to realize that a team which plays .600 ball overall is *not* expected to play .600 ball in one-run games. The common perception is that the best teams win the close games, and that a mark of great teams is the ability to pull out one-run games. It's a silly perception.
Here's n example why: when the best team in the league takes on the worst, the better team is probably going to win around three-fourths of the games. If the Indians play the Twins 20 times, Cleveland is probably going to 15-5 or so. Included in those 15 wins are going to be games with scores of 14-1, 12-2 and, in today's game, 18-8. How many of the Twins' victories are going to be blowouts? When the Twins do squeeze out a win, it's likely to be a 7-5 score or something similar.
In reality, one of the marks of a good team is the ability to blow out its opponents. And in fact, what we find is that all teams play towards the center in one-run games: a .600 team will play around .565 ball, while a .400 team will play around .435. So in comparing records, each team's record in one-run games was compared to their expected record in one-run games, based on their overall record.
The results are striking. Among teams that did extremely well in one-run games, 50 out of 56 (89%) also did much better than expected compared to their Pythagorean projection. On the flip side, 56 out of 68 (82%) of teams that played poorly in one-run games had a similar profile in their Pythagorean record. The results among teams that over- or under-achieved by lesser amounts follows the same overall trend.
To make a long story short: the correlation between a team's performance in one-run games and their performance compared to their Pythagorean record is +0.56, indicating a strong if not overwhelming correlation.
So, since we found a high correlation between good bullpens and exceeding a team's pythagorean win total, we should expect that good bullpens would also correlate with a better-than-expected record in one-run games, right? Here's the data:
One-Run Record vs. Avg. Expected Record SO WO WU SU Best Bullpens 12 45 31 14 + 0.22 Games Worst Bullpens 11 39 34 18 - 0.26 Games
While there does appear to be a trend, it's a small one. Just 56% of the teams with great bullpens performed especially well in one-run games, and barely 51% of the teams with bad bullpens did poorly in those situations. On average, the teams with the best bullpens played just a half-game better in one-run contests than teams with the worst pens. That's just one-sixth of the three-game disparity we found in their performance against their Pythagorean record.
So if these teams aren't doing that much better or worse in one-run games, how are they winning more games than their Pythagorean projection? Let's look at their record in two-run games against their expected record. Keep in mind that just as in one-run games, teams play towards the center in two-run games, although the effect is not as significant: a .600 team should play around .580 ball in two-run games.
One-Run Record vs. Avg. Expected Record SO WO WU SU Best Bullpens 13 44 34 10 + 0.21 Games Worst Bullpens 8 33 48 13 - 0.54 Games
The same general trend is followed in two-run games as in one-run games, and in fact the correlation appears to be a little stronger. When we combine the results of both one- and two-run contests, teams with good bullpens win 0.47 games more than expected, while teams with bad bullpens win about 0.80 games less than expected. Together, this still explains less than half of the disparity between actual records and Pythagorean projections.
One possibility that would explain the difference is that teams with strong bullpens-- remember, in this case we're defining "bullpen" as only those relievers used in tight games--may have focused their resources on acquiring good late-inning relievers to the detriment of the rest of the team. This might cause such teams to get blown out of games more often than usual. If a team has two great relievers but a lousy starting rotation, there's going to be a lot of big losses that the bullpen isn't going to bail you out of. Those blowouts would damage the team's runs scored/runs allowed ratio, and hence their Pythagorean record, but would cause only minor damage to their overall win-loss record. However, it's just a theory, and more research may be needed to determine the true source of this dichotomy.
So maybe this isn't the last word after all.
Rany Jazayerli is an author of Baseball Prospectus.