BP Comment Quick Links
March 12, 1999 What the 'R' Column Doesn't Tell YouEvaluating Relievers by Prevention of Expected Runs
Back in my college days, when I was spending too much of my time at the Astrodome instead of studying for finals, my friends and I used to cringe whenever Astros reliever Dave Smith was brought into a game with runners on base. Our experience (admittedly, probably based on a small sample) was that Smith was liable to allow all the runners he was inheriting to score, racking up runs charged to his predecessors, but finish the inning before any of his own runners crossed the plate, keeping his own "R" and "ER" columns pristine. Our nickname for him at the time: Dave "Hey, it's not my ERA" Smith. Whether or not we were right about Smith in particular, it's widely understood that the traditional way of assigning runs to pitchers -- if a runner scores, the pitcher who let him reach base is charged with the run (I know that's a simplification) -- doesn't always work well, especially with relievers. A reliever who comes in with the bases loaded and two out and gives up a triple before getting out of the inning has hurt his team plenty, but he has helped his ERA. On the other hand, a reliever in the same situation who strands all three runners by getting the first batter out has done far more in terms of preventing runs than a pitcher who gets that same first batter out with two outs and the bases empty. A statistic for evaluating relievers should do a better job of allocating credit for runs prevented and blame for runs scored than the traditional run assignment method. Gary Skoog came up with an technique for doing this with his Value Added approach, described in an article in the 1987 Bill James Baseball Abstract. His idea, which he applied to all players, not just relievers, was to measure the quality of each appearance by the player in terms of how it changed run expectation. He used a situational scoring table from The Hidden Game of Baseball to map each of the 24 possible bases/outs states into the expected number of runs that would score in the remainder of the inning starting from that state. For example, according to Hidden Game's table, an average team with one out and runners at second and third would expect to score 1.37 runs in the remainder of the inning; with two outs and a runner on second, a team would expect to score 0.35 runs. He then calculated the value of an appearance as the difference between the expected runs when the appearance started and the expected runs when the appearance ended, plus the number of runs that scored during the appearance itself. For example, if a reliever entered a game with one out and runners on second and third, and was pulled after getting a strikeout and surrendering a double that scored both runners, his Value Added for that appearance would be 1.37 - 0.35 - 2 = -0.98 runs. A player's Value Added for the season is the sum of the Value Added results for each appearance in the season. Steve Schulman has extended this system to remove expected runs due to errors. (Skoog had this basic error-removing idea as well, but he wasn't explicit about how to remove them with pitchers.) Schulman has applied his version under another name -- Runs Prevented -- to evaluate relievers in the past few editions of the STATS Baseball Scoreboard. This system offers a good solution to the run assignment problem inherent in traditional pitching statistics. Unfortunately, Skoog and Schulman's realizations of the system do not address some other well-known problems in baseball analysis. In particular, the ratings they produce are not adjusted for park, league, or era. For this article, we'll make a few changes to Skoog and Schulman's calculations, listed here in rough order of importance:
We'll call our rating Adjusted Runs Prevented (ARP) -- adopting Schulman's name for the method but adding the "Adjusted" in front of it to make it clear we're doing things differently. A reliever's ARP is the number of runs that he prevented over an average pitcher, given the bases/outs situation when he entered and left each game, adjusted for league and park. The exact formula for a reliever's ARP for a game is
where
ARP represents a reliever's cumulative run prevention above average over the course of a season. We would also like to have a measure of a reliever's rate of run prevention using the same basic approach. We'll do this by making this observation: like ARP, Pete Palmer's Adjusted Pitching Runs (APR) measures production in terms of runs prevented above average. APR is based on the pitcher's rate of runs allowed: Palmer's version is based on park-adjusted ERA, but for this article, we'll use adjusted RA, or ARA. It's easy to compute a pitcher's ARA from his APR by applying the APR formula "backwards":
If we apply this same "backwards" formula, substituting ARP for APR, we can get a rate stat based on ARP that is directly comparable to adjusted RA. This "Runs Allowed Average" derived from ARP, abbreviated RA(ARP), is calculated as:
If a player has a RA(ARP) of 3.00, I read it that he prevented runs like a "full-inning pitcher" with a 3.00 RA, where a full-inning pitcher is a pitcher who never enters or leaves a game in the middle of an inning. Starters are usually pretty good approximations to full-inning pitchers, since the vast majority of their innings are full innings. Note that in extreme cases, a reliever's RA(ARP) can be negative. For example, if a reliever enters a game with bases loaded and nobody out, and he gets out of the inning with nobody scoring, he'll have a negative RA(ARP) for the game. That actually makes a certain amount of sense: that reliever has done even more to prevent runs than a pitcher who came in with nobody on none out and pitched a scoreless inning. 1998 Leaders With that introduction to the methods, let's get to the results. Here are the top 20 relievers in the majors in 1998, ranked by Adjusted Runs Prevented: Pitcher Team IP R ARA APR RA(ARP) ARP -------------------------------------------------------- Hoffman,T SDP 73.0 12 1.71 25.3 1.40 27.8 Jackson,M CLE 64.0 11 1.43 24.1 1.01 27.1 Urbina,U MON 69.3 11 1.64 24.6 1.60 24.9 Nen,R SFG 88.7 21 2.28 25.1 2.40 23.9 Mecir,J TAM 84.0 30 2.98 17.3 2.36 23.0 Gordon,T BOS 79.3 24 2.62 19.5 2.26 22.6 Rivera,M NYY 61.3 13 1.88 20.1 1.56 22.3 Shaw,J C/L 85.0 22 2.50 22.0 2.71 20.0 Veres,D COL 76.3 26 2.64 18.6 2.57 19.2 Ligtenberg,K ATL 73.0 24 3.08 14.2 2.52 18.7 Wall,D SDP 65.3 19 3.03 13.0 2.48 17.0 Timlin,M SEA 79.3 26 2.83 17.6 2.95 16.6 Mills,A BAL 77.0 32 3.70 9.6 2.92 16.3 Wetteland,J TEX 62.0 17 2.25 17.7 2.52 15.9 Swindell,G M/B 91.3 40 3.87 9.7 3.29 15.6 Brocail,D DET 62.7 23 3.15 11.7 2.65 15.1 Darensbourg,V FLA 71.0 29 3.99 6.6 2.91 15.1 Reed,S S/C 80.3 29 3.17 14.8 3.13 15.1 Lowe,D BOS 75.0 30 3.46 11.4 3.07 14.6 Howry,B CHW 54.3 20 3.22 9.7 2.43 14.5 Trevor Hoffman finished as the top major league reliever by this measure, followed closely by Mike Jackson, who won the AL crown. If you go by the traditional method of charging runs to pitchers (represented here by APR), Ugueth Urbina and Rob Nen would rank right up there with Hoffman and Jackson. However, a deeper look into their appearances shows that Nen and Urbina were not as good with the runners they inherited and/or the runners they turned over to others, so they finished a notch below the top two. Hoffman was joined in the top 20 by his teammate Donne Wall to form the best 1-2 combination among major league bullpens. Tom Gordon was joined by Derek Lowe and a half-season of Greg Swindell (as well as Jim Corsi, who just misses this list) to form the best core of a bullpen. The real surprises on this list are Jim Mecir, Alan Mills, Greg Swindell, and Vic Darensbourg. None of those pitchers fare especially well if you look at how many runs they were charged with, but when you also take into account how well they handled others' runners and how well others handled their runners, they finished among the league's elite. Looking at the league's best is fun, but, as Felipe Alou knows, you can't have fun all the time. Here are the worst 10 relievers from 1998, ranked by ARP: Pitcher Team IP R ARA APR RA(ARP) ARP -------------------------------------------------------- Bennett,S MON 91.7 61 6.87 -20.8 6.95 -21.6 Ayala,B SEA 75.3 66 7.55 -22.8 7.14 -19.3 Pittsley,J KCR 60.0 50 6.91 -13.9 7.32 -16.6 Dehart,R MON 28.0 22 8.12 -10.2 10.10 -16.4 Baldwin,J CHW 21.7 26 10.49 -13.6 11.59 -16.3 Judd,M LAD 11.3 19 17.13 -15.5 17.73 -16.2 Bailes,S TEX 40.3 33 6.72 -8.5 8.10 -14.7 Rojas,M NYM 58.0 39 6.39 -10.1 6.98 -13.9 Speier,J C/F 20.7 20 9.39 -10.5 10.58 -13.2 Stanifer,R FLA 48.0 33 6.72 -10.1 7.19 -12.6 I'm sure Alou would have loved to have a Ugueth Urbina clone he could run out for the 120 or so innings he used Shayne Bennett and Rick DeHart. Bennett had an especially noteworthy season; it's not often you see a guy who pitches that badly finish among the league leaders in relief innings pitched. At least with Bennett, the traditional method of charging runs to pitchers was doing a fair job of evaluating him. Scott Bailes, on the other hand, had an already awful ARA of 6.72 which was actually deceptively low. If you look at how he prevented runs overall, he pitched more like an 8.10-RA pitcher. Here are the team numbers from 1998, with bullpens ranked best to worst in ARP:
Team IP R ARA APR RA(ARP) ARP ----------------------------------------------- COL 449.3 198 3.42 70.5 3.68 57.3 TAM 502.0 232 3.85 54.4 3.82 56.2 HOU 412.0 162 3.81 46.5 3.73 50.4 SFG 512.3 200 3.76 60.6 4.10 41.3 BOS 498.3 238 4.13 38.5 4.15 37.6 NYY 395.3 173 3.89 41.3 4.00 36.2 CAL 483.7 225 4.09 39.5 4.17 35.4 SDP 421.0 175 4.33 23.2 4.18 30.2 TEX 465.7 234 4.13 36.2 4.27 28.7 PIT 456.3 213 4.29 27.4 4.29 27.2 MIL 516.3 245 4.32 29.2 4.35 27.1 CIN 509.3 256 4.58 14.0 4.50 18.3 CLE 454.7 238 4.37 23.1 4.47 18.2 ATL 364.0 175 4.50 13.3 4.41 16.7 NYM 403.3 180 4.24 26.3 4.57 11.4 BAL 475.3 237 4.44 20.4 4.74 4.8 DET 493.7 255 4.44 21.4 4.76 3.5 MIN 505.0 263 4.65 9.8 4.87 -2.2 STL 538.0 274 4.87 -2.8 4.92 -5.5 TOR 402.0 225 4.89 -2.6 4.98 -6.8 PHI 459.0 243 4.83 -0.3 4.97 -7.2 OAK 447.7 265 4.99 -8.2 5.12 -14.3 KCR 475.3 292 5.10 -14.2 5.22 -20.8 CHC 470.3 257 4.95 -6.4 5.27 -23.2 LAD 435.7 214 5.02 -9.2 5.49 -32.1 ARI 433.3 255 5.39 -27.1 5.56 -35.4 MON 520.3 268 5.32 -28.5 5.44 -35.5 SEA 430.0 271 5.43 -29.0 5.58 -36.0 CHW 516.0 324 5.49 -38.0 5.48 -37.3 FLA 525.3 306 5.69 -50.4 5.82 -58.0 ----------------------------------------------- ML 13970.6 7093 4.58 378.5 4.71 186.3 On a team level, ARP adds less to traditional run assignment methods (e.g., APR) than it does for individuals. The only major difference between ARP and APR for teams is that ARP takes into account how the bullpen handles the runners that the starters leave on base. As a result, you'd expect a team's ARP to be close to its APR, and in general that holds true in the table above. Nevertheless, there are a few teams that the two methods rate significantly differently. The Giants, Cubs, and Dodgers were examples of bullpens that were much worse than you'd expect given the number of runs charged to them. The Dodgers, in particular, had a surprisingly bad bullpen -- only the Marlins were significantly worse. On the other hand, the Padres were somewhat better than their APR and ARA indicated. The best bullpen in the league last year belonged to the Rockies, who dealt with the loss of Steve Reed quite nicely. Beyond the 'R' Column There are two major reasons that a reliever's 'R' column in a box score does a poor job of measuring his performance in that game: (1) it doesn't reflect how well he dealt with the runners he inherited, and (2) it does reflect how well the reliever's successors dealt with the runners he turned over to them. It's interesting to break down these two aspects of performance individually, to see who gave and received the most and least help. Performance in handling inherited runners can be measured analogously to ARP: for each appearance by the reliever, look at the situation when the reliever entered each game, and how that compares to the situation when he left the game. For this, we consider only the runners that the reliever inherited. We figure the number of those inherited runners that would be expected to score, given where they were on the bases and how many outs there were when the reliever entered. We compare that total to the number who actually did score while the reliever was in the game, plus, of any were still on base when the reliever left the game, how many would have been expected to score from that ending bases/outs state. The end result is a number called Expected Inherited Runs Prevented (EIRP). If this number is positive, it means that the reliever is chopping runs off his teammates' totals; if it's negative, it means the reliever is adding runs to his teammates. Below are the best and worst relievers of 1998 in this measure. "IRnr" is the number of runners the reliever inherited; "EIRs" is the expected number of those inherited runners to score, "IR" is the number who actually did score while the reliever was in the game, and "EIRf" is the expected number of inherited runners left on base when the reliever left his games to score.
Gave Most Help: Top 10 ML Relievers in Gave Least Help: Bottom 10 ML Relievers EIRP in EIRP Pitcher Team IRnr EIRs IR EIRf EIRP Pitcher Team IRnr EIRs IR EIRf EIRP ----------------------------------------- ----------------------------------------- Swindell,G M/B 50 16.6 6 1.6 8.6 Radinsky,S LAD 41 12.3 20 1.4 -10.0 Plesac,D TOR 80 27.5 15 3.8 8.3 Bruske,J L/S 29 8.5 15 0.6 -8.6 Mills,A BAL 42 17.8 8 1.8 7.8 Mcmichael,G N/L 36 11.6 19 0.6 -8.4 Delucia,R CAL 46 18.1 10 0.3 7.6 Fetters,M O/C 53 15.0 23 0.6 -8.0 Krivda,R C/C 15 6.9 0 0.2 6.5 Boehringer,B SDP 37 11.0 15 1.0 -7.1 Wickman,B MIL 31 10.7 5 0.0 6.3 Runyan,S DET 76 29.0 27 9.0 -7.1 Mecir,J TAM 47 17.9 11 0.7 6.1 Rodriguez,R SFG 57 21.5 25 2.5 -7.0 Howry,B CHW 23 8.0 2 0.0 5.8 Guardado,E MIN 77 28.5 32 2.7 -6.2 Myers,Ra T/S 22 5.8 0 0.7 5.8 Adams,T CHC 45 16.1 21 0.8 -5.9 Darensbourg,V FLA 33 12.5 4 2.7 5.6 Holmes,D NYY 22 8.3 12 0.9 -5.4 Let's look first at the table on the right. Well, Dodgers fans might want to avert their eyes. We mentioned above that the Dodgers had a surprisingly bad bullpen, and here's a big reason why. The three worst relievers in the majors at handling inherited runners all spent significant time in a Los Angeles uniform last year. Scott Radinsky inherited 41 runners, who would have been expected to score 12.3 runs. Radinsky watched from the mound as 20 of those runners crossed the plate, and he left another 1.4 expected runs worth of those runners on base for others to deal with. The result is that Radinsky cost his teammates 10 park-adjusted runs on their runs allowed totals beyond what you'd expect from an average pitcher. On the other end of the spectrum, Greg Swindell chopped 8.6 runs off his Twins and Red Sox teammates' ledgers. Greg Olson doesn't quite make the list on the left, but he deserves honorable mention there as one of the few relievers who completely erased all the inherited runners he saw in 1998. Olson inherited 16 runners, but none of them scored, and none were still on base when he left his games. The other effect we want to isolate is how much help the reliever got from his successors when he turned runners over to them. This is measured similarly to EIRP. For this measure, we consider only those runners who are on base when the reliever left his games; Mike Emeigh calls these "bequeathed runners". Actually, we'll be more restrictive than that: we'll consider only the bequeathed runners who are the reliever's responsibility (i.e., if they score, they'll be charged to him). We'll call these "Own Bequeathed Runners" (OBRnr). We figure out the expected number of Own Bequeathed Runners to score given where they were on the bases and how many outs there were when the reliever left the game, and subtract from that total the number who actually did score. The result is Expected Bequeathed Runs Saved. If it's positive, the reliever's successors bailed him out by chopping runs off his R total; if it's negative, his successors have added runs to his ledger compared to what he could have expected from average bullpen support. Below are the top 10 and bottom 10 by this measure. OBRnr is the number of the reliever's Own Bequeathed Runners turned over to other relievers during the year; EBR is the expected number of those runners to score; and ABR is the number who actually did score.
Received Most Help: Top 10 ML Relievers Received Least Help: Bottom 10 ML in EBRS Relievers in EBRS Pitcher Team OBRnr EBR ABR EBRS Pitcher Team OBRnr EBR ABR EBRS --------------------------------------- --------------------------------------- Springer,R A/A 16 6.6 0 6.7 Stanton,M NYY 27 10.2 16 -6.4 Quantrill,P TOR 34 9.6 3 6.5 Nitkowski,C HOU 17 5.9 11 -5.8 Adams,T CHC 32 12.1 6 6.3 Mcmichael,G N/L 24 6.9 11 -4.4 Gunderson,E TEX 39 14.4 9 5.3 Embree,A A/A 30 9.0 13 -4.4 Alfonseca,A FLA 21 7.2 2 5.1 Mathews,TJ OAK 38 11.5 16 -4.3 Hudek,J N/C 20 6.6 2 4.9 Mohler,M OAK 25 7.8 12 -4.0 Mendoza,R NYY 17 4.8 0 4.7 Harris,P CAL 29 11.1 15 -4.0 Mcelroy,C COL 21 9.6 4 4.7 Sullivan,S CIN 24 9.3 13 -3.9 Plunk,E C/M 30 9.5 5 4.3 Batista,M MON 12 3.2 6 -3.7 Delucia,R CAL 40 14.3 10 4.2 Henriquez,O FLA 7 2.7 6 -3.7 Russ Springer turned 16 of his runners over to his Diamondback and Brave bullpen-mates, and his mates did not let him down. Not a single one of those 16 runners scored, saving Springer an expected 6.7 park-adjusted runs off his totals. It's interesting that two guys who were traded for one another, Springer and Alan Embree, show up on opposite sides of the successor support spectrum. These tables provide more evidence that the Braves got the short end of the stick in that deal -- Embree's RA during 1998 was significantly inflated by his teammates' poor support, while Springer's was significantly deflated by his teammates. The Yankees' Mike Stanton was easily the reliever most victimized by his teammates, accumulating 6.4 undeserved runs on his park-adjusted totals. From glancing at the previous table, it's reasonable to suspect that Darren Holmes was one of the main culprits in allowing all those bequeathed runners of Stanton's to score. The previous four tables isolated some interesting components of relief performance not measured by traditional run assignment (or, in the case of bequeathed runners, measured when it should not be). But what about putting those components all together to find out how badly traditional run assignment can underrate or overrate a reliever's contribution? Measuring this is easy. As we mentioned above, ARP and Palmer's Adjusted Pitching Runs are each attempting to measure the same thing -- runs prevented above an average pitcher. APR represents the traditional run assignment approach, comparing the league average run scoring to the number of runs charged to the pitcher. ARP represents the run expectation approach, and, for the reasons we've argued in this article, should give a more accurate picture of a reliever's performance. The amount that traditional run assignment overrates or underrates a reliever, then, is the difference between those two measures. Here are the lists for 1998:
10 ML Relievers most underrated by conventional run assignment (ranked by ARP - APR) Pitcher Team EIRP EBRS ARA RA(ARP) APR ARP Diff. ---------------------------------------------------------- Stanton,M NYY 4.3 -6.4 5.74 4.70 -8.0 1.2 9.1 Darensbourg,V FLA 5.6 -1.1 3.99 2.91 6.6 15.1 8.5 Mulholland,T CHC 4.7 -3.1 4.76 3.73 0.5 8.8 8.3 Harris,P CAL 5.4 -4.0 4.69 3.51 0.9 8.8 7.9 Plesac,D TOR 8.3 0.1 4.02 2.70 4.5 11.8 7.3 Nitkowski,C HOU 1.8 -5.8 4.39 3.33 2.9 9.9 7.0 Weathers,D C/M 4.4 -2.3 5.13 4.12 -2.0 4.7 6.8 Mills,A BAL 7.8 0.4 3.70 2.92 9.6 16.3 6.7 Sullivan,S CIN 1.9 -3.9 5.54 4.97 -8.1 -1.6 6.4 Swindell,G M/B 8.6 2.8 3.87 3.29 9.7 15.6 5.9 10 ML Relievers most overrated by conventional run assignment (ranked by ARP - APR) Pitcher Team EIRP EBRS ARA RA(ARP) APR ARP Diff. ---------------------------------------------------------- Radinsky,S LAD -10.0 2.0 3.48 5.43 9.2 -4.1 -13.4 Adams,T CHC -5.9 6.3 4.86 6.35 -0.3 -12.3 -12.0 Alfonseca,A FLA -4.7 5.1 4.98 6.29 -1.2 -11.5 -10.3 Runyan,S DET -7.1 2.9 3.93 5.75 5.0 -5.1 -10.2 Bruske,J L/S -8.6 1.9 4.42 6.09 2.5 -7.7 -10.2 Quantrill,P TOR -1.8 6.5 2.84 3.95 17.7 7.8 -9.9 Fetters,M O/C -8.0 -0.5 4.94 6.25 -0.7 -9.3 -8.5 Springer,R A/A -1.6 6.7 4.56 5.98 1.6 -6.8 -8.3 Rodriguez,R SFG -7.0 0.7 4.11 5.23 5.2 -3.0 -8.2 Guardado,E MIN -6.2 0.1 4.63 5.70 1.5 -6.4 -7.9 There are lots of names here that we've already seen in the tables above that deal with inherited and bequeathed runners. What these tables give you is an idea of how badly traditional run assignment can distort the picture of the reliever's effectiveness. Since relievers pitch relatively few innings, and since a high percentage of those innings deal with inherited and bequeathed runners, the distortion can be pretty bad. Take Scott Radinsky (please). If you look only at the number of runs he was charged with, he appeared to be a solid contributor last year, even after taking the Dodger Stadium park effect into account. A 3.48 park-adjusted RA is nothing to sneeze at these days. However, when you take into account his league-worst handling of inherited runners and the fact that he got better-than-average support from his successors, you find that his overall contribution to the Dodgers' run prevention effort was that of a 5.43-RA pitcher -- almost two full runs per 9 innings worse! The story with the Cubs' Terry Adams was similar. Cub fans could easily look at that 4.86 park-adjusted RA, which is around league average, and conclude that he was harmless. But if you look at how badly he handled inherited runners, combined with the tremendous help he got from his friends (paging Terry Mulholland), you could make a case that he was actually among the league's most harmful relievers. One more point about these tables: if you're really paying attention, you might be wondering why you can't just add up the Expected Inherited Runs Prevented and Expected Bequeathed Runs Saved columns to get the Diff figure. After all, I've been saying above that the two differences between traditional run assignment and ARP are inherited runners and bequeathed runners. Well, actually what I said is that those are the two major differences. There are also a number of less significant effects captured by ARP but not by other statistics. In no particular order:
In no way would I claim that the tools presented here are the final word in reliever evaluation. There are some shortcomings of these measures that are easy to see now, and perhaps others that will come to light as more results from them become available. What they do is add to the available information that baseball fans can use to form an overall picture of a reliever's performance. And more information should make everyone happy. Everyone, that is, except Scott "Hey, it's not my ERA" Radinsky.
Comments and questions welcome. Copyright © 1999 Michael Wolverton. 0 comments have been left for this article.
|