CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here to subscribe
<< Previous Article
Premium Article Prospectus Q&A: Mark T... (09/24)
<< Previous Column
Premium Article Ahead in the Count: Hi... (09/17)
Next Column >>
Premium Article Ahead in the Count: Pi... (10/01)
Next Article >>
Premium Article Prospectus Perspective... (09/24)

September 24, 2010

Ahead in the Count

Predicting Strikeouts with Whiff and Swing Rates

by Matt Swartz

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Subscribe for $4.95 per month
Recurring subscription - cancel anytime.


a 33% savings over the monthly price!

Purchase a $39.95 gift subscription
a 33% savings over the monthly price!

Already a subscriber? Click here and use the blue login bar to log in.

When I wrote about pitchers with major divides between their ERAs and SIERAs two weeks ago, a reader inquired why Clay Buchholz had such a pedestrian strikeout rate while having an above average swinging-strike rate. Buchholz has mustered just 6.2 K/9, nearly a full strikeout below the 7.1 league average, but has induced batters to swing and miss on 9.5 percent of his pitches according to FanGraphs, a full percentage point above the 8.5 percent league average. The question was apparent: Do pitchers who get a lot of whiffs increase their strikeout rates over time?

The question is a logical one to ask. Inducing a swing and a miss could be more indicative of skill than getting an umpire to call a strike on a pitch that a hitter opted to take. The most notorious strikeout kings of all time have always been those that can get hitters to swing and miss. The top 10 swinging strike rates in 2010 according to FanGraphs are pitchers with high strikeout rates:

Pitcher

Swinging Strike %

K/9

Francisco Liriano

12.5

9.54

Cole Hamels

11.9

9.29

Josh Johnson

11.8

9.11

Jered Weaver

11.1

9.62

Tim Lincecum

11.0

9.62

Shaun Marcum

10.9

7.72

Mat Latos

10.8

9.38

Hiroki Kuroda

10.8

7.37

Ryan Dempster

10.7

8.52

Ricky Nolasco

10.5

8.39

All are above average, and the top five strike out more than a hitter per inning. Clearly, swinging strikes are highly correlated with strikeouts—in fact, they have a correlation of .84 among starting pitchers.

Swinging strikes are also heavily correlated year-to-year. Among all pitchers with at least 80 innings as starting pitchers from 2002-09, there was a .79 correlation for swinging-strike rate, slightly above the .77 correlation for strikeout rate (SO/PA) itself.

Further evidence that this information could be useful is that swinging strikeouts per PA have a .77 correlation year-to-year, while called strikeouts per PA have a .59 correlation. 

I ran a regression of strikeout rate on the previous year’s swinging-strike rate, controlling for age and year, and found that a one percentage point increase in swinging-strike rate correlated with a 1.55 percentage point increase in SO/PA the following year, which was extremely significant. Thus, in the absence of information about the previous year's strikeout rate, knowing that a pitcher had more swinging strikes implies they likely had more strikeouts the following year.

Interestingly, the swinging SO/PA and called SO/PA do a fairly good job of predicting each other. The correlation between called SO/PA and swinging SO/PA the following year is .26, and the correlation between swinging SO/PA and called SO/PA the following year is .25. 

The question remains whether swinging strikes provide additional information than strikeouts already do, so I ran a regression (again controlling for age and year) of strikeout rates on the previous year’s strikeout rate and the previous year’s swinging-strike rate, and found an interesting result.

Variable

Coefficient

P-Stat

Constant

.0606

.000

SO/PA

.7294

.000

Year 2002-04

-.0056

.026

Year 2008

.0060

.074

Age

-.0010

.000

Swinging Strike%

.1236

.251

This implies that the extra information that swinging-strike rate provide, once the previous year’s strikeout rate is already determined, is not very useful at all. For every one percentage point above average in the previous year’s strikeout rate, the following year’s strikeout rate is likely to be about 0.73 percentage points above average. However, for pitchers with the same strikeout rate the previous year, a pitcher with one percentage point higher swinging-strike rate only will have a 0.12 percentage point higher strikeout rate, which is not statistically significant. The value added from this information is virtually useless. 

(Note that 2005-07 is not included as a coefficient. Those familiar with regression analysis will recall the coefficients for 2002-04 and for 2008 are both measured relative to the 2005-07 effect.)

The R2 statistic, which measures how much of the variation in the dependent variable (following year’s strikeout rate) can be explained by the variables used tells the same story. The R2 for the regression above is .6118, just a tiny fraction of the .6110 R2 statistic for running the same regression without swinging-strike rate. 

In other words, the value added by knowing the swinging-strike rate when the strikeout rate is already known is less than a tenth of a percent of the differences in players’ strikeout rates the following year.

Running the same regression on pitchers who were less than 28 years old in the first year actually reduced the coefficient to a statistically insignificant negative number (-.169), suggesting that swinging-strike rate for younger pitchers provides no additional information that the strikeout rate does not already provide.

I decided to check whether getting more of one’s strikeouts as swinging strikes was helpful in predicting which direction a pitcher’s strikeout rate was headed, and found that this was not useful either. I ran a regression of strikeout rate on the previous year’s strikeout rate, dummies to control for year and age, and swinging-strikeout rate and got the following results:

Variable

Coefficient

P-Stat

Constant

.0633

.000

SO/PA

.7551

.000

Year 2002-04

-.0045

.053

Year 2008

.0058

.087

Age

-.0010

.000

Swinging SO/PA

.0266

.752


In other words, knowing how much of a pitcher’s SO/PA came on swings versus called strikes was not useful at all.

In fact, running an equivalent regression with called SO/PA and swinging SO/PA as separate variables to see this more clearly shows that the coefficients on called and swinging strikeouts are almost exactly the same:

Variable

Coefficient

P-Stat

Constant

.0633

.000

Year 2002-04

-.0045

.053

Year 2008

.0058

.087

Age

-.0010

.000

Called SO/PA

.7551

.000

Swinging SO/PA

.7817

.000

The information provided by knowing the form of those strikeouts is not all that useful.

Does this mean that none of the pitch information that we find in the “Plate Discipline” section on FanGraphs is useful when we have the regular box scores? Answering this requires using the same approach to look at each of the other variables. The following tables show the regression coefficients in a series of regressions on previous year’s strikeout rate, age and year controls, and an alternating statistic in each column. The P-Stats are in parenthesis underneath the coefficients in each cell. Statistically significant coefficients are bolded, while weakly statistically significant coefficients are bolded and italicized.

Variable

Coef (P-Stat)

Coef (P-Stat)

Coef (P-Stat)

Coef (P-Stat)

Coef (P-Stat)

Coef (P-Stat)

Coef (P-Stat)

Coef (P-Stat)

Coef (P-Stat)

Constant

.0606

(.000)

.0526

(.004)

.0808

(.000)

.1007

(.057)

.0748

(.000)

-.0037

(.946)

.0536

(.000)

.0470

(.122)

.0449

(.034)

SO/PA

.7294

(.000)

.7736

(.000)

.7754

(.000)

.7458

(.000)

.7586

(.000)

.8132

(.000)

.7563

(.000)

.7764

(.000)

.7704

(.000)

Year ‘02-‘04

-.0056

(.026)

-.0045

(.056)

-.0040

(.091)

-.0052

(.038)

-.0054

(.030)

-.0029

(.280)

-.0022

(.399)

-.0051

(.044)

-.0046

(.048)

Year ‘08

.0060

(.074)

.0058

(.084)

.0054

(.110)

.0060

(.078)

.0067

(.055)

.0058

(.084)

.0041

(.246)

.0062

(.073)

.0058

(.087)

Age

-.0010

(.000)

-.0010

(.000)

-.0010

(.000)

-.0010

(.000)

-.0010

(.000)

-.0010

(.000)

-.0010

(.000)

-.0010

(.000)

-.0010

(.000)

Swinging Strike%

.1236

(.251)

 

 

 

 

 

 

 

 

F-Strike%

 

.0201

(.496)

 

 

 

 

 

 

 

Zone%

 

 

-.0341

(.327)

 

 

 

 

 

 

Contact%

 

 

 

-.0399

(.474)

 

 

 

 

 

O-Contact%

 

 

 

 

-.0158

(.328)

 

 

 

 

Z-Contact%

 

 

 

 

 

.0686

(.218)

 

 

 

Swing%

 

 

 

 

 

 

.0620

(.060)

 

 

O-Swing%

 

 

 

 

 

 

 

.0233

(.575)

 

Z-Swing%

 

 

 

 

 

 

 

 

.0422

(.339)

Definitions:

Swinging Strike% = Percent of pitches thrown that were swung at and missed
F-Strike% = Percent of hitters faced for which the first pitch of PA was a strike
Zone% = Percent of pitches thrown in the strike zone
Contact% = Percent of hitters’ swings that were fouls or hit into play
O-Contact% = Contact% on pitches out of the strike zone
Z-Contact% = Contact% on pitches in the strike zone
Swing% = Percent of pitches at which hitters swung
O-Swing% = Swing% on pitches out of the strike zone
Z-Swing% = Swing% on pitches in the strike zone

Nearly every bit of information that the pitch data gave us was useless. For each of the regressions above, we had the relevant information we needed by knowing the pitcher’s strikeout rate, age, and what year it was. The coefficients on eight of the nine variables are not even weakly statistically significant, but there is one variable that has a weakly significant (positive) effect on predicting the next year’s strikeout rate: O-Swing%, the rate at which pitchers get hitters to chase pitches (supposedly) out of the zone.

Part of the reason that this statistic’s weak significance was so surprising to me is that I did not expect this information to be useful due to measurement error. The information, which FanGraphs obtains from Baseball Info Solutions, is determined by watching each pitch from the center-field camera. Given the issue of parallax, the center-field camera gives a distorted view and the observer can be fooled. This is an important point—the data appears rather questionable when looking at the yearly averages. League average O-Swing% across the years has moved around and mostly increased:

2002: 18.1%
2003: 22.2%
2004: 16.6%
2005: 20.3%
2006: 23.5%
2007: 25.0%
2008: 25.4%
2009: 25.1%
2010: 29.3%

Perhaps pitchers were gradually getting better at inducing batters to swing at the right pitches? It seems unlikely given the change in league average “Zone%.” describing the percentage of pitches in the strike zone:

2002: 54.6%
2003: 51.4%
2004: 55.1%
2005: 53.8%
2006: 52.6%
2007: 50.3%
2008: 51.1%
2009: 49.3%
2010: 46.6%

It seems far more likely that pitches were being recorded as out of the strike zone more over the years, since this occurrence seems to become less likely, just as percent of swings at pitches that were supposedly out of the strike zone seems to become more likely. The overall swing rate has only ranged from 45.3 to 46.5 percent over the years, so it seems more likely that those swings were being treated as pitches out of the strike zone more so later on in the decade than that hitters were swinging at more pitches out of the zone as pitchers throwing a roughly equal and opposite amount of pitches out the zone.

I thought that normalizing the data might lead to stronger results, by measuring the O-Swing% relative to league average. This did not work:

Variable

Coefficient

P-Stat

Constant

.0659

.000

SO/PA

.7654

.000

Year 2002-04

-.0046

.049

Year 2008

.0057

.090

Age

-.0010

.000

Net O-Swing%

.0340

.386

The O-Swing% relative to the league average is now useless. Chances are this is because of the reason Baseball Prospectus' Colin Wyers expressed concerns with the data—the year to year fluctuations in league average O-Swing% are probably a result of moving center-field cameras. The average for each park is probably vastly different and using a league-average effect is probably not very useful.

None of the other statistics as measured relative to the league average yielded remotely significant coefficients when included in regressions, either.

The most promising pitch data is the rate at which pitchers can get hitters to chase pitches out of the strike zone. The pitchers that tend to do so are more likely to see their strikeout rates increase the following year. However, the measurement error in these statistics is currently so large that it is difficult to glean any major insight from them. Chances are that this information could be more useful if measured more scientifically, and this could be one of the areas where pitch data could move our understanding of baseball forward.

However, the most important information to take away from this article is that even more objective statistics like swinging-strike rate, swing rate, and contact rate, as well as called versus swinging strikeout rates are all of very little added value beyond knowing what the pitchers strikeout rate will tell you.

 Of course, strikeout rate for pitchers is one of the quickest to stabilize among all baseball statistics, and so the added value of information beyond knowing historical strikeout rate is least likely to be significant for strikeout rate as compared with any other statistic. Thus, next week I will look at walk rates and attempt to determine whether this type of information can inform our knowledge about walk rates any more than it could have informed us about strikeout rates.  

Matt Swartz is an author of Baseball Prospectus. 
Click here to see Matt's other articles. You can contact Matt by clicking here

12 comments have been left for this article.

<< Previous Article
Premium Article Prospectus Q&A: Mark T... (09/24)
<< Previous Column
Premium Article Ahead in the Count: Hi... (09/17)
Next Column >>
Premium Article Ahead in the Count: Pi... (10/01)
Next Article >>
Premium Article Prospectus Perspective... (09/24)

RECENTLY AT BASEBALL PROSPECTUS
Playoff Prospectus: Come Undone
BP En Espanol: Previa de la NLCS: Cubs vs. D...
Playoff Prospectus: How Did This Team Get Ma...
Playoff Prospectus: Too Slow, Too Late
Premium Article Playoff Prospectus: PECOTA Odds and ALCS Gam...
Premium Article Playoff Prospectus: PECOTA Odds and NLCS Gam...
Playoff Prospectus: NLCS Preview: Cubs vs. D...

MORE FROM SEPTEMBER 24, 2010
Premium Article On the Beat: Raising Arizona
Premium Article Kiss'Em Goodbye: Cleveland Indians
Premium Article Transaction Action: Reds and 'Stros, Brewers...
Premium Article Under The Knife: Explaining The Life of a Me...
Premium Article Checking the Numbers: Examining the Braves' ...
Premium Article Prospectus Perspective: October in Minneapol...
Premium Article Prospectus Q&A: Mark Teahen

MORE BY MATT SWARTZ
2010-10-07 - Playoff Prospectus: Thursday LDS Pitching Ma...
2010-10-06 - Premium Article Playoff Prospectus: Wednesday LDS Pitching M...
2010-10-01 - Premium Article Ahead in the Count: Pitch Data and Walks
2010-09-24 - Premium Article Ahead in the Count: Predicting Strikeouts wi...
2010-09-17 - Premium Article Ahead in the Count: High BABIPs and True Ski...
2010-09-10 - Premium Article Ahead in the Count: The Biggest ERA-SIERA Di...
2010-09-03 - Premium Article Ahead in the Count: Sabermetric Teams and Sa...
More...

MORE AHEAD IN THE COUNT
2010-12-01 - Ahead in the Count: So How Good are MVPs Rea...
2010-11-11 - Premium Article Ahead in the Count: Are the Adjusted Standin...
2010-10-01 - Premium Article Ahead in the Count: Pitch Data and Walks
2010-09-24 - Premium Article Ahead in the Count: Predicting Strikeouts wi...
2010-09-17 - Premium Article Ahead in the Count: High BABIPs and True Ski...
2010-09-10 - Premium Article Ahead in the Count: The Biggest ERA-SIERA Di...
2010-09-03 - Premium Article Ahead in the Count: Sabermetric Teams and Sa...
More...