BTN Article

"Support-Neutral" Statistics -- A Method of Evaluating the True Quality of a Pitcher's Start

by Michael Wolverton

This article originally appeared in By The Numbers, SABR's Statistical Analysis subcommittee's newsletter, Volume 5, Number 4, Dec. 1993. The only change made to the original article is the regeneration of Figure 1 with new parks and values (the original numbers were lost), and the corresponding modification of the text describing that figure. One significant change has been made to the method for calculating the SN stats since this article was published; that change is documented in the summary.

Motivation

In recent years, we've seen the development and growing use of two measurements designed to evaluate starting pitchers on a game-by-game basis: Quality Starts and Game Score. Both measures are attempting in some way to look at the quality of each outing the starter has, rather than looking at the average or cumulative performance over the course of year like ERA does. But both measures have their weaknesses as total measures of pitching performance.

The arguments against Quality Starts are well known. Detractors point out that the worst qualifying outing -- 6 innings and 3 earned runs -- is not "quality" at all. A related objection is that Quality Starts makes no attempt to quantify the degree of quality a start has -- 6 innings, 3 runs is the same as 8 innings, 2 runs which is the same as a 9-inning shutout.

Partly in answer to these objections, Bill James developed the Game Score, which combines a starter's box score numbers (IP, H, ER, R, BB, K) using weights, where the weights are assigned such that the league average score is around 50, the best imaginable score is around 100, and the worst imaginable score (by someone outside the state of Colorado) is around 0. Game Score is acknowledged as an interesting measure of "game domination" by a starter, but it has weaknesses as a total measure of starter quality (i.e., his contribution to team victories): it's too dependent on strikeouts, possibly too dependent on hits and walks (after all, the number of runs given up is really the only thing that matters), and it isn't park-adjusted.

Despite the weaknesses of these two measures, looking at a pitcher's starts game-by-game is still a good idea. Looking at each start's contribution to winning, rather than cumulative run-prevention over the course of a year (ERA or Pitching Runs), can help us answer questions like: Given equal ERAs, do some pitchers pitch in a way that will tend to win more games than other pitchers? In particular, is it better for a starter to be flaky -- either very good or very bad on a given day -- or consistently average? Does the park have a smaller influence on the value of the start when the start is very good or very bad?

So here's what we'd like out of a stat measuring the quality of a start:

it should depend only on numbers appearing in a box score.
it should be independent of a pitcher's support, both from his team's offense and from his team's relievers.
it should be park-adjusted.
the resulting measurement should be in terms of some kind of meaningful unit, such as games or runs, rather than being a unitless index (and, ideally, it should be obvious to any baseball fan what a good or bad score in those units is).
most importantly, it should reflect the contribution that a start had toward winning the game.

I've developed a couple of measurements that meet these five requirements. (Actually, the ideal stat would also be very easy to compute, but hey, 5 out of 6 isn't bad, right?). Support-Neutral Wins and Support-Neutral Losses (SNW and SNL) measure the expected number of wins and losses a pitcher would have with his outings, if he got average support from his offense and his bullpen. Support-Neutral Value Added (SNVA) measures the total number of games that an average team would win given the pitcher's starts, over the number of games they'd win with a league average starter. All of these stats are computed using only the number of innings pitched, number of runs given up, and the park the game was pitched in. SNVA may be a slightly more accurate measure of a starter's actual value compared to league average, but the SNW/L record has the advantage of being flexible and more understandable. Both of them, in my opinion, constitute an improvement over Thorn and Palmer's Pitching Runs as a total measure of starter worth.

Support-Neutral Wins and Losses

Support-Neutral Wins is calculated by determining the probability that a pitcher would get the win for each start he has, and then summing up the individual probabilities over all of his starts. The sum gives you the number of wins a pitcher could expect to get for an average team, given his performances. A "performance" here consists only of the number of innings pitched, the number of runs (not earned runs) given up, the park in which the game was played, and whether the pitcher was at home or on the road -- SNW assumes that these are the only things which influence whether the pitcher wins or loses.

The rest of this section describes the formulas that are used to calculate SNW; readers who aren't interested in the specific methods of calculation are welcome to skim or skip to the next section.

To calculate the probability that a pitcher wins the game, we just need to look at the definition of a win: A starting pitcher wins the game if his team has the lead when he's taken out of the game, and they never relinquish that lead. So, for a given outing by the starter, the probability that he gets the win is just the probability that his team will take the lead (score more runs than the starter gives up) by the time he's removed times the probability that they'll hold that lead until the game is over.

To put this into a formula, we just need to determine and add up the probabilities of all the different ways his team can take and hold a lead:

where

SNW(i, r) is the probability a starter who goes i innings and gives up r runs will get the win, given an average team playing behind him.

PScore(i, r) is the probability that an average team will score r runs in i innings.

PHold(k, i) is the probability that an average team will hold a k-run lead (without ever relinquishing it) for the i remaining innings until the end of the game.

The above formula is actually a simplification of the formula I use in my software to calculate SNW (I'll refer to the formula in my software as the "real" SNW formula). In order to make it easier to explain, I made a few assumptions to get the formula above. First, that formula assumes that the starter comes out of the game after pitching a full inning (i.e., he pitches no extra thirds of an inning). The formula is complicated somewhat when thirds of an inning are taken into account, but the same general idea applies: his team must be leading when he comes out, and his team must hold the lead for the extra thirds in the inning he leaves, plus all the rest of the remaining innings. The real SNW formula does take thirds of an inning into account.

Second, the above formula doesn't explicitly take the park into account. To take park effects into account, we need to make SNW, PScore, and PHold be functions of the park in which the game is played. A hitter's park should inflate the probabilities that an average team will score a high number of runs, and a pitcher's park should do the opposite. The real SNW formula does take park into account. I talk a little more about my handling of park effects in the Appendix.

Third, the above formula doesn't take into account whether the starter is pitching at home or on the road. Maybe contrary to intuition, this does make a difference. Consider a starter who leaves after pitching the 7th inning: if he's at home, he's pitched the top of the 7th, so he gets credit for the runs his team scored in the first 6 innings, plus the runs they score in the bottom of the 7th; if he's on the road, however, he pitched the bottom of the 7th, so he gets credit for the runs his team scored in the first 7 innings, plus the runs they score in the top of the 8th. So, all other things being equal, it's easier for pitchers to get wins (and harder for them to get losses) when they pitch on the road. The formula above is for a pitcher pitching at home, and the road formula is slightly different. The real SNW formula does take home/road status into account.

Finally, the above formula doesn't quite reflect the full definition of a pitcher's win -- a starter can't get the win unless he goes 5 innings or more. Presumably, this extra condition was put into the win rule to reduce the number of undeserving starters getting lucky wins. But when you're assigning fractions of a win, rather than 1 win or 0 wins, there's no possibility of getting lucky. So, the real SNWL formula does not take the five-inning condition into account, although, for the purposes of comparison, I do calculate an expected win (E(W)) number which is equal to 0 if the pitcher goes less than 5 innings and equal to SNW otherwise.

Let's finish off the formula above. PScore is easy to find recursively, provided you know an average team's single-inning scoring distribution, PInningScore:

where

PInningScore(r) is the probability that an average team will score r runs in an inning.

PHold is a little more complicated, since you have to see to it that the pitcher's team never relinquishes the lead. Still, it's not too hard to reduce it to the following (below, "tr" stands for the number of runs the pitcher's team scores in an inning, and "or" stands for the number the opposing team scores in an inning):

The only remaining unknown is the single-inning scoring distribution, PInningScore. But that's readily available from linescores of past games. The scoring distribution (separate distributions for each league) I'm using right now was taken from a few weeks of linescores in USA TODAY from late-April and early-May of 1992. I'll probably be able to get a more accurate distribution someday, but I'm sure that this one is close enough.

The SNL value for a single start is calculated analogously to SNW.

Support-Neutral Value Added

SNW and SNL gives us a nice way of getting a "fair" W/L record for a starter, which can then be used to compare to his actual W/L record, or a replacement-level winning percentage, etc. (see the Results section). But these numbers calculate how likely it is that the pitcher will win or lose the game -- i.e., get the "W" or "L" next to his name in the box score. A related but slightly different notion is the likelihood that the team will win when a pitcher takes the mound. In measuring the starter's contribution to team victories, we'd like to evaluate how much the outing by the starter changes the team's chance of winning from what it was at the beginning of the game (which I'll assume to be 50%). This is what SNVA is designed to measure.

Not surprisingly, the formula for SNVA looks pretty similar to the formula for SNW:

where

SNVA(i, r) is the difference between an average team's chance of winning after the starter has left after pitching i innings and giving up r runs, and their chance of winning at the beginning of the game (50%).

PScore(i, r) was defined above

PATWin(r, i) is the chance that an average team will eventually win the game given that there are i innings left and the difference between their score and their opponents' score is r.

Also not surprisingly, PATWin looks a lot like PHold:

What SNVA gives us (when summed over all a pitcher's starts) is the number of games in the standings he's worth to his team above the average starter. Of course, this is exactly the same unit (games above the average player) that all of Total Baseball's [1] measurements are in. So it'll be interesting to compare SNVA to Thorn and Palmer's Adjusted Pitching Runs¹ to see how well they correlate and also where the differences lie.

Results
Best, worst, luckiest, and unluckiest starters of 1992

That's enough of the gory details of the calculation of the stats. Let's look at the fun stuff -- what the stats tell us about real pitchers. I tracked all starting pitchers in the majors over the 1992 season, and Tables 1 and 2 show the top pitchers in both leagues for 1992. Each table shows the pitcher's Support-Neutral Wins (SNW), Losses (SNL), and Winning Percentage (SNPct), followed by his actual win-loss record (W, L), his runs allowed per 9 innings (RA), his Adjusted Pitching Runs (APR), and his Support-Neutral Value Added (SNVA). Interestingly, Greg Maddux, with the fabulous year he had pitching in Wrigley, was the only pitcher in either league who came close to "deserving" to win 20 games.

Pitcher	Team	SNW	SNL	SNPct	W	L	RA	APR	SNVA
Mussina	BAL	17.2	7.8	.688	18	5	2.61	47.0	4.60
Clemens	BOS	17.5	8.5	.674	18	11	2.92	43.8	4.39
Appier	KCR	15.2	6.6	.698	15	8	2.55	42.6	4.08
Guzman,Ju	TOR	13.4	6.4	.679	16	5	2.79	32.3	3.34
Nagy	CLE	16.3	9.9	.623	17	10	3.25	33.3	3.11
Eldred	MIL	8.2	2.4	.776	11	2	1.88	28.1	2.81
McDowell	CHI	16.3	10.7	.602	20	10	3.28	30.5	2.53
Smiley	MIN	16.0	10.5	.603	16	9	3.47	28.3	2.75
Navarro	MIL	15.8	10.8	.595	17	11	3.59	22.8	2.45
Abbott,J	CAL	13.6	8.6	.612	7	15	3.11	27.7	2.36
Viola	BOS	15.8	11.2	.586	13	12	3.74	21.4	2.35
Fleming	SEA	15.1	10.7	.586	17	10	3.73	19.7	2.10
Perez,M	NYY	14.9	10.5	.586	13	16	3.42	26.3	1.90
Wegman	MIL	15.6	11.5	.576	13	14	3.58	24.4	2.06
Erickson	MIN	13.3	9.8	.574	13	12	3.65	20.8	1.75
Bosio	MIL	14.3	11.1	.563	16	6	3.89	13.6	1.52
Key	TOR	13.3	10.4	.561	13	13	3.66	18.0	1.42
Brown,K	TEX	15.5	12.9	.545	21	11	3.96	14.5	1.18
Welch	OAK	8.0	5.8	.580	11	7	3.42	9.7	0.91
Rasmussen	KCR	3.0	0.8	.785	4	1	1.67	11.4	1.09

Table 1: Top 20 AL Starters in 1992, ranked by SNW-SNL

Pitcher	Team	SNW	SNL	SNPct	W	L	RA	APR	SNVA
Maddux,G	CHI	19.5	7.4	.724	20	11	2.28	53.9	5.75
Tewksbury	STL	16.1	7.3	.687	15	5	2.45	38.5	4.12
Schilling	PHI	13.9	6.8	.670	12	9	2.59	31.1	3.37
Morgan	CHI	16.3	9.5	.632	16	8	3.00	30.4	3.22
Rijo	CIN	13.9	8.1	.632	15	10	2.86	28.5	2.57
Smoltz	ATL	16.6	11.0	.601	15	12	3.28	25.1	2.67
Glavine	ATL	15.1	9.8	.608	20	8	3.24	23.9	2.71
Martinez,D	MON	14.5	9.1	.613	16	11	2.98	24.0	2.50
Swindell	CIN	13.8	8.5	.619	12	7	3.05	24.5	2.56
Swift	SFG	10.4	5.1	.670	9	3	2.36	23.6	2.51
Drabek	PIT	15.9	10.8	.595	15	11	2.95	26.7	2.32
Fernandez,S	NYM	13.6	8.8	.608	14	11	2.81	24.7	2.25
Hill	MON	14.0	10.0	.583	16	9	3.14	19.3	1.93
Leibrandt	ATL	13.5	9.5	.586	15	7	3.68	11.8	2.02
Smith,P	ATL	5.6	2.1	.724	7	0	2.22	16.2	1.69
Wakefield	PIT	6.4	3.3	.656	8	1	2.54	13.8	1.42
Rivera	PHI	6.5	3.7	.639	7	3	2.95	10.8	1.34
Benes	SDP	14.0	11.3	.553	13	14	3.50	10.7	1.22
Portugal	HOU	6.6	4.0	.621	5	3	2.69	12.5	1.18
Avery	ATL	14.0	11.5	.549	11	11	3.66	14.8	1.15

Table 2: Top 20 NL Starters in 1992, ranked by SNW-SNL

On the flip-side, Tables 3 and 4 show the worst 10 starting pitchers in 1992 for each league.² Not surprisingly, many of these guys showed up in different uniforms in 1993, several on expansion teams.

Pitcher	Team	SNW	SNL	SNPct	W	L	RA	APR	SNVA
Armstrong	CLE	5.2	11.5	.313	3	15	6.37	-28.4	-3.08
Milacki	BAL	4.5	9.5	.320	6	8	6.18	-21.8	-2.32
Terrell	DET	2.9	7.5	.280	3	6	6.98	-22.9	-2.26
Slusarski	OAK	2.5	6.9	.265	5	5	6.25	-18.7	-2.05
Sanderson	NYY	9.9	14.0	.414	12	11	5.40	-22.4	-2.02
Aldred	DET	2.4	6.5	.273	2	7	7.63	-21.7	-1.89
McCaskill	CHI	10.2	14.2	.417	12	13	5.00	-16.1	-1.92
Wells	TOR	3.4	6.9	.332	6	7	7.70	-27.7	-1.81
Stieb	TOR	3.4	6.7	.337	3	6	5.92	-13.3	-1.50
Otto	CLE	3.9	7.2	.354	5	9	6.75	-19.8	-1.57

Table 3: Bottom 10 AL Starters in 1992, ranked by SNW-SNL

Pitcher	Team	SNW	SNL	SNPct	W	L	RA	APR	SNVA
Bowen	HOU	0.6	6.1	.094	0	7	12.22	-31.3	-2.61
Wilson,T	SFG	7.0	11.3	.384	8	14	4.79	-18.5	-2.03
Abbott,K	PHI	4.5	8.3	.352	1	14	4.92	-11.4	-1.84
Martinez,R	LAD	7.3	11.1	.397	8	11	4.90	-19.1	-1.84
Henry,B	HOU	8.3	11.7	.414	6	9	4.40	-12.4	-1.57
Young,A	NYM	2.8	6.2	.313	1	7	5.79	-16.8	-1.63
Black	SFG	8.6	11.9	.420	10	12	4.47	-14.7	-1.54
Hershiser	LAD	10.2	13.3	.434	10	15	4.31	-12.5	-1.60
Hammond	CIN	7.0	10.0	.409	7	10	4.61	-7.4	-1.36
Blair	HOU	1.4	4.5	.241	1	5	7.51	-16.8	-1.52

Table 4: Bottom 10 NL Starters in 1992, ranked by SNW-SNL

This method also allows you to evaluate the level of luck a pitcher experienced in his W/L record -- i.e. it allows you to look at how much a pitcher's actual W/L record differs from his expected W/L record given the way he pitched. Tables 5 through 8 show the luckiest and unluckiest starters in each league in 1992. No one should be surprised that Jack Morris, who compiled a 21-6 record despite a 4+ ERA, was far and away the luckiest starter in either league last year. SNW/L evaluation shows that you'd expect his 1992 performance to produce a 13-13 mark if he had gotten average support. Equally unsurprising is the result that Jim Abbott was the unluckiest pitcher in either league. The Angels gave him enough support only for a miserable 7-15 record, while his pitching actually merited something closer to 13-9.

Pitcher	Team	E(W)	E(L)	W	L	Diff
Morris	TOR	13.3	13.1	21	6	14.7
Brown,K	TEX	15.5	12.9	21	11	7.4
Moore	OAK	12.1	14.3	17	12	7.3
Bosio	MIL	14.2	11.1	16	6	6.9
Hibbard	CHI	8.1	11.3	10	7	6.2
Darling	OAK	11.5	12.5	15	10	6.0
Sanderson	NYY	9.7	14.0	12	11	5.3
Wickman	NYY	2.7	2.9	6	1	5.2
Slusarski	OAK	2.3	6.9	5	5	4.6
McDowell	CHI	16.2	10.7	20	10	4.5

Table 5: Luckiest 10 AL Starters in 1992, ranked by W-E(W) + E(L)-L

Pitcher	Team	E(W)	E(L)	W	L	Diff
Abbott,J	CAL	13.3	8.6	7	15	-12.7
Perez,M	NYY	14.6	10.5	13	16	-7.0
Hanson	SEA	9.6	12.8	7	17	-6.8
Armstrong	CLE	5.2	11.5	3	15	-5.8
Wegman	MIL	15.6	11.5	13	14	-5.1
Valera	CAL	10.4	9.6	7	11	-4.8
Kamieniecki	NYY	8.8	12.0	6	14	-4.7
Ryan	TEX	9.3	8.7	5	9	-4.6
Chiamparino	TEX	1.5	1.3	0	4	-4.2
Reed	KCR	5.1	6.0	2	7	-4.1

Table 6: Unuckiest 10 AL Starters in 1992, ranked by W-E(W) + E(L)-L

Pitcher	Team	E(W)	E(L)	W	L	Diff
Burkett	SFG	9.6	13.0	13	9	7.5
Glavine	ATL	15.0	9.8	20	8	6.8
Seminara	SDP	5.1	6.3	9	4	6.2
Lefferts	SDP	8.4	10.4	13	9	6.0
Tomlin	PIT	11.5	11.3	14	9	4.8
Hurst,B	SDP	12.4	12.1	14	9	4.7
Cone	NYM	11.3	9.9	13	7	4.6
Leibrandt	ATL	13.2	9.5	15	7	4.3
Osborne	STL	9.2	11.4	10	8	4.2
Wakefield	PIT	6.3	3.3	8	1	4.0

Table 7: Luckiest 10 NL Starters in 1992, ranked by W-E(W) + E(L)-L

Pitcher	Team	E(W)	E(L)	W	L	Diff.
Abbott,K	PHI	4.5	8.3	1	14	-9.2
Candiotti	LAD	11.8	10.5	10	15	-6.3
Gross,Ke	LAD	10.9	10.9	8	13	-5.0
Clark,M	STL	5.4	8.0	3	10	-4.4
Schilling	PHI	13.9	6.8	12	9	-4.0
Benes	SDP	13.8	11.3	13	14	-3.5
Carter	SFG	1.5	2.4	1	5	-3.1
Boskie	CHI	3.2	7.1	3	10	-3.1
Maddux,G	CHI	19.5	7.4	20	11	-3.1
Whitehurst	NYM	2.3	3.3	1	5	-3.0

Table 8: Unluckiest 10 NL Starters in 1992, ranked by W-E(W) + E(L)-L

League total numbers

In theory, the support-neutral record of the entire league should come close to the actual win-loss record of the league, and in fact, in 1992, SNW/L did appear to predict league W/L pretty well. Table 9 shows both the expected and actual W/L totals for each league in 1992. The National League's record corresponded very well to the record expected by the model, with no-decisions being underpredicted only slightly by SNW/L. The American League is predicted a little less successfully -- there were nearly 30 more wins in the league than expected, and nearly 10 more losses than expected. I believe that part of the discrepancy between expected record and actual record can be explained by the fact that relief pitchers prevented runs better than starters in 1992. Since starters are competing for the (actual) decision primarily with the other starter, it makes sense that starters would get a few more (actual) wins than predicted by a model which has them competing with league average pitching for the decision.

	E(W)	E(L)	E(Pct.)	W	L	Pct.
NL	660.9	690.3	.489	655	678	.491
AL	776.1	846.7	.478	805	837	.490

Table 9: Expected and Actual records of all starters in the leagues

Value of "flaky" and "steady" pitchers

Do the Support-Neutral stats tell us anything that Thorn and Palmer's Adjusted Pitching Runs weren't already telling us? Since both APR and SNVA are trying to measure exactly the same thing (albeit by different methods), we'd expect there to be a pretty strong correlation between them. There is. For most pitchers, SNVA (whose unit is "games above average") is approximately equal to one-tenth of APR (whose unit is "runs above average"). This is what you'd expect given the well-known result that each 10 runs prevented (or gained) leads on average to about 1 extra win in the standings (see, e.g., [2]). However, there are plenty of cases where APR and SNVA give significantly different evaluations. Look at the 1992 records of Charlie Leibrandt and Melido Perez:

                      APR    SNVA
        Leibrandt    11.8    2.02
        Perez,M      26.3    1.90

APR evaluates Perez as being 14.5 runs -- about one-and-a-half games -- better than Leibrandt. However, SNVA shows that, when the pitchers' performance is evaluated game-by-game, Leibrandt was actually a little better than Perez.

The key to this discrepancy between the two measurements is found in the amount of consistency the two pitchers exhibited in their starts. Perez was a model of consistency last year; he rarely got bombed, but he also was rarely dominating. Leibrandt, on the other hand, was one of the least consistent pitchers in the majors. And that is the most surprising result I've seen so far from these SN stats: run-prevention stats such as ERA and APR tend to undervalue flaky pitchers, and overvalue consistent ones, at least when you consider them pitching for an average team. Tables 10 through 13 show the "flakiest" (most inconsistent) and "steadiest" (most consistent) pitchers in the leagues last year, as evaluated by the variance of the SNVA of their individual starts. You can see from those tables that APR pretty consistently underestimates a pitcher's value when the pitcher is flaky, and pretty consistently overestimates his value when he's steady. 9 of the 10 flakiest pitchers in both the NL and AL were underestimated by APR, and 8 of the 10 steadiest in the NL and 10 of the 10 steadiest in the AL were overestimated by APR. And the pitchers for whom there were really large discrepancies between APR and SNVA -- Leibrandt, Kyle Abbott, Gooden, Hammond, Sutcliffe, Perez, Kamieniecki, McDowell -- all showed up near the top of the predicted list.

The reason for this undervaluing is that APR counts all runs as equal, while in fact all runs do not contribute an equal amount toward winning/losing a game. In particular, Bill James did a study that showed that runs scored by a team after they've already scored 5 in a game do not contribute the same amount toward the probability of winning than those first 5 runs did [3]. So, pitchers who give up more than 5 runs in a couple of games will be undervalued by ERA and APR, because those really crummy outings probably weren't quite as crummy as ERA and APR would have you believe.

Pitcher	Team	APR	SNVA	SNVA Var
Smith,Z	PIT	3.7	0.70	0.088
Smoltz	ATL	25.1	2.67	0.083
Saberhagen	NYM	4.5	0.73	0.082
Leibrandt	ATL	11.8	2.02	0.082
Osborne	STL	-12.6	-0.97	0.079
Glavine	ATL	23.9	2.71	0.076
Hurst,B	SDP	-1.5	0.12	0.075
Cone	NYM	8.5	0.67	0.074
Belcher	CIN	1.9	0.53	0.074
Benes	SDP	10.7	1.22	0.068

Table 10: Flakiest 10 NL Starters in 1992, ranked by variance of SNVA (15 starts minimum)

Pitcher	Team	APR	SNVA	SNVA Var
Abbott,K	PHI	-11.4	-1.84	0.022
Rijo	CIN	28.5	2.57	0.032
Browning	CIN	-8.7	-1.17	0.035
Gooden	NYM	-6.1	-1.33	0.036
Hammond	CIN	-7.4	-1.36	0.041
Tewksbury	STL	38.5	4.12	0.042
Maddux,G	CHI	53.9	5.75	0.042
Fernandez,S	NYM	24.7	2.25	0.043
Boskie	CHI	-12.5	-1.47	0.044
Gardner	MON	-9.5	-1.20	0.044

Table 11: Steadiest 10 NL Starters in 1992, ranked by variance of SNVA (15 starts minimum)

Pitcher	Team	APR	SNVA	SNVA Var
Sutcliffe	BAL	-8.3	-0.33	0.089
Smiley	MIN	28.3	2.75	0.078
Krueger	MIN	-0.1	0.14	0.078
Johnson,R	SEA	1.7	0.24	0.077
Gubicza	KCR	7.3	0.78	0.075
Langston	CAL	5.6	0.77	0.073
Fleming	SEA	19.7	2.10	0.073
Viola	BOS	21.4	2.35	0.073
Rhodes	BAL	6.7	0.77	0.071
Darling	OAK	-4.8	-0.31	0.070

Table 12: Flakiest 10 AL Starters in 1992, ranked by variance of SNVA (15 starts minimum)

Pitcher	Team	APR	SNVA	SNVA Var
Armstrong	CLE	-28.4	-3.08	0.034
Darwin	BOS	8.2	0.45	0.036
Milacki	BAL	-21.8	-2.32	0.037
Kamieniecki	NYY	-8.9	-1.57	0.038
Perez,M	NYY	26.3	1.90	0.039
Reed	KCR	-0.3	-0.20	0.040
Appier	KCR	42.6	4.08	0.040
Cook	CLE	-3.6	-0.56	0.042
Hibbard	CHI	-10.7	-1.28	0.045
McDowell	CHI	30.5	2.53	0.045

Table 13: Steadiest 10 AL Starters in 1992, ranked by variance of SNVA (15 starts minimum)

As an example of this, consider a David Wells outing from 1992: he gave up 13 runs in 4+ innings. APR just subtracts his 13 runs from the number of runs a league average pitcher would have given up in those same 4 innings (about 2), and concludes that Wells was worth about -11 runs, or -1.1 games, in that start. Did Wells really cost the Blue Jays more than a game in the standings with that awful start? Of course not. He guaranteed them a loss, of course, but they had some chance of losing the game to begin with anyway -- about a 50% chance if you make the simplifying assumption that they're an average team. SNVA gives a far more reasonable value for Wells's start: it was worth about -0.5 games. That's as much as a single start can cost you. Wells didn't have the requisite 15 starts to show up in Table 12, but you can see from his record in Table 3 how much he was underestimated by APR.

Effect of the park on win probability

One other question I've been looking at is how the value of starts is influenced by park effects. Figure 1 shows the SNVA for a 9-inning complete game in both Atlanta's Fulton County Stadium (the NL's most extreme hitters' park in 1992) and the San Francisco's Candlestick Park (the NL's most extreme pitchers' park in 1992). We can see from the figure that the effect of the park on the value of the start is far less at the two extremes of start quality than it is for middle-of-the-road starts. The difference between Fulton County and Candlestick for the value of a 9-inning, 4-run start is almost four times as large as the difference between Fulton County and Candlestick for the value of a shutout.

Figure 1: SNVA for Fulton County Stadium (top line) and Candlestick
Park (bottom line), given that the starter pitched 9 innings

This would imply that methods of park adjustments which simply multiply a pitcher's "raw" value by a park factor might be over- or underestimating the park's actual effect on his value. Since the park's effect on very good or very bad starts is much less than on average starts, a reasonable hypothesis would be that very good or very bad pitchers deserve less of a boost (or less diminishment) to their rating than current park adjusment methods give them.

However, the preliminary investigation of this hypothesis I have done on real starting pitchers (with 1992 data) has failed to find much support for it. I'd still like to do some more work on this issue.

Weaknesses of the Approach

Here are a few of the problems with these measurements:

They assume that scoring distributions of an inning are independent from the distributions of surrounding innings.
They (like most other measures of pitching) don't account for situational pitching. A pitcher who gets a big lead is likely to start throwing all fastballs, and he may give up a few meaningless runs that he wouldn't have given up without the big lead. I'm not too worried about this, because I don't think those big-lead situations are common enough for anybody to make much of a difference.
They don't account for differences in the ways pitchers are used by their managers. Some pitchers get left in the game to get pounded, some are routinely yanked early, etc. Note however that SN stats do a better job than other methods of mitigating the manager's effect. If Cito Gaston leaves David Wells in the game to give up 13 runs, SNVA produces a rating which is not much different than if Gaston had yanked Wells after giving up "only" 7 or 8 runs.
They don't account for the defense playing behind the pitcher. Suffice it to say that this is a very hard problem.

Conclusion

I've presented Support-Neutral Wins, Losses, and Value Added, three park- and league-adjusted measurements of the value of individual starts, and of starting pitchers. I feel these are a valuable addition to existing measurement methods, both because they can provide a measurement of pitcher worth in units which are familiar to all baseball fans (pitcher wins and losses) and because they seem to be a slightly more accurate measure of the true value of a start than existing methods.

Special thanks to Greg Spira, whose discussion sparked many of the ideas presented here. Thanks to David Tate and others on the Internet newsgroup rec.sport.baseball, who provided valuable feedback on the method. And thanks to my wife, Cindy, for reading this paper and giving me many useful suggestions.

References

[1] Thorn, J. and Palmer, P. (eds.), Total Baseball, 3rd edition, Harper Collins, New York, 1993.

[2] Thorn, J. and Palmer, P., The Hidden Game of Baseball, Doubleday Books, New York, 1985.

[3] James, B., The 1986 Bill James Baseball Abstract, Ballantine Books, New York, 1986, pp. 172-175.

Appendix: Park Effects

One possible way of incorporating park effect numbers into these measurements would be to take whatever final value the above formulas produce (SNW, SNL, or SNVA) and multiply it by some park effect constant for the pitcher's home park. This is essentially the approach Thorn and Palmer use in Total Baseball. But the method of calculating the Support-Neutral stats allows a potentially more informative use of park effects. Since park effects (as printed in Elias, e.g.) reflect how a park inflates or deflates average scoring ability, it makes sense to have the "average team" playing behind the pitcher effected by the park, and then calculate the likelihood that the pitcher's outing plus this park-adjusted average team will lead to a win. So for any game, the PInningScore (league average scoring) distribution is adjusted to reflect the park's effect on run scoring. The resulting number then reflects the park's effect on winning rather than cumulative run scoring/prevention.

The question then becomes: how do you translate a single park effect percentage like the ones in Elias (the only source of park effects I have) into an adjusted PInningScore distribution? There are an infinite number of ways to do this. The way I'm doing it now is to change the probability of scoring 0 runs by one factor, and change the probability of scoring i runs for i>1 by another factor, such that the total number of expected runs scored in an inning is increased/reduced by the Elias number. For example, if the Astrodome decreases scoring by 10%, I increase PInningScore(0) for the Astrodome by one factor, and decrease PInningScore(i) for i>1 by another factor, such that the expected single-inning score reflected by PInningScore is reduced by 10% from the park-neutral scoring distribution. If that isn't clear (and I'm sure it isn't), I should say that I don't think it makes much difference the exact method used.

Footnotes

¹ Adjusted Pitching Runs is the basic metric which Thorn and Palmer (the authors of Total Baseball) use to evaluate pitchers. APR is the number of runs prevented by a pitcher that a league average pitcher would've given up. The APR that I'm using in this paper differs from Thorn and Palmer's statistic in two ways: 1) I'm using runs where Thorn and Palmer use earned runs, and 2) the method of park adjustment I use is a simplification of the one used in Total Baseball. It is included here for comparison with SNVA.

² Actually, it's probably inaccurate to use the word "worst" here, since the method of ranking the pitchers -- ranking them according to SNW-SNL -- sets the baseline for comparison at league average (anyone below .500 gets a negative rating). Of course, it's quite possible for a below-average pitcher to still be valuable to his team. A better method of producing this list might have been to compare a pitcher's SN record to a lower baseline, e.g., a .450 pitcher. This would have left pitchers like Hershiser and McCaskill, who pitched a lot of innings at somewhat below-league-average performance, off of the lists in favor of other pitchers who pitched fewer innings but at further-below-average performance.

Baseball Prospectus Home