CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here to subscribe
<< Previous Article
Premium Article Under The Knife: Deer ... (06/10)
<< Previous Column
Premium Article Crooked Numbers: Eight... (06/02)
Next Column >>
Premium Article Crooked Numbers: The B... (06/16)
Next Article >>
Fantasy Article Fantasy Focus: Weekend... (06/11)

June 10, 2005

Crooked Numbers

Protons:Electrons::Swinging:Looking

by James Click

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Subscribe for $4.95 per month
Recurring subscription - cancel anytime.


a 33% savings over the monthly price!

Purchase a $39.95 gift subscription
a 33% savings over the monthly price!

Already a subscriber? Click here and use the blue login bar to log in.

One of the aspects of baseball that lends itself so easily to statistics is that most of the outcomes are very clean and usually binary. A batter either reaches base or he doesn't; a ball is a hit or not; even if it is a hit, there are only four possible degrees of hit. Even more important, the data are--with certain occasional exceptions--perfectly gathered: An official scorer sits at each game, carefully noting the outcome of each event in the commonly accepted manner. This runs counter to the application of statistics in the real world, where data can not only be incorrect but outstandingly complex.

With play-by-play data, however, the events we've taken for granted as simple and binary can suddenly become more complex and, if properly applied, a more accurate reflection of the action on the field. Not entirely unlike J.J. Thomson and others discovering an internal structure to the atom (but without the massive scientific and physical impacts on our understanding of the universe), breaking large, binary blocks of baseball stats into smaller, more descriptive pieces can yield more information. While this method has largely been applied to more advanced defensive metrics such as UZR, it can also be applied to events such as singles, doubles, triples, and, in particular, strikeouts.

On the surface, strikeouts seem to be a very clean statistic, much like walks and home runs, its cousins in the Three True Outcomes tree of knowledge and certainty. On the other hand, each of those three stats can be broken up into smaller pieces: Walks can be intentional, unintentional, or semi-intentional; home runs can be inside-the-park, opposite field, towering, or line drive; and strikeouts can be swinging, looking, bunting, or even dropped. In the box score, they all look the same--a line-drive home run counts the same as blasts of Ruthian proportions. But being able to break strikeouts into separate categories may yield additional insight into both player approaches on the mound and at the plate as well as predictive value about players who may be under- or over-performing reasonable expectations.

With that in mind, let's take a closer look at every pitcher's friend--the whiff. In 2004, 73.1% of strikeouts were swinging (either complete whiffs or foul tips), 26.3% were looking, and the remaining 0.6% was either missed bunt attempts or foul bunts. This data gives us a handy baseline for seeing who's above and below average when it comes to types of strikeouts. Let's check out the leaders in 2004 (minimum 50 strikeouts) to get an idea of what kind of pitchers inhabit both ends of the spectrum:


Pitcher         Year  Swinging  Looking  Total  Swing_Perc  SO/PA
-------         ----  --------  -------  -----  ----------  -----
Dave Burba      2004     94        4      50      95.9%     15.3%
Mike Wood       2004     50        4      54      92.6%     12.5%
Esteban Yan     2004     63        6      69      91.3%     18.2%
Guillermo Mota  2004    150       16      85      90.4%     21.6%
Brad Lidge      2004    140       16     157      89.7%     42.5%
Luis Vizcaino   2004     56        7      63      88.9%     21.1%
Salomon Torres  2004     54        7      62      88.5%     16.3%
Danny Baez      2004     46        6      52      88.5%     17.6%
Jon Lieber      2004     89       12     102      88.1%     13.6%
Brad Radke      2004    125       18     143      87.4%     15.9%
-----
Ismael Valdez   2004     78       56      67      58.2%      8.9%
Carlos Silva    2004     44       32      76      57.9%      8.7%
Darrell May     2004     69       51     120      57.5%     14.4%
Woody Williams  2004     73       55     131      57.0%     16.0%
Scot Shields    2004     62       47     109      56.9%     24.0%
Jeff Weaver     2004     84       67     153      55.6%     16.4%
Esteban Loaiza  2004    126      108     117      53.8%     14.3%
Chad Cordero    2004     44       39      83      53.0%     23.2%
Dave Weathers   2004     93       90      61      50.8%     17.1%
Jaret Wright    2004     70       87     159      44.6%     20.4%

The top group--those who cause the most swings and misses--looks mostly like a pretty hard-throwing, walk-stingy group with a couple exceptions. The bottom group is a slightly different brand of pitcher--not as many closers and not as many players with a reputation of missing bats. Interestingly for Yankee fans, the top group includes discarded rotation member Jon Lieber; newly acquired Jaret Wright leads the bottom group by a wide margin. These two players frame the next natural question stemming from breaking strikeouts into sub-categories: By looking at one type of strikeout or another, could the Yanks have seen Wright's disappointing (and brief) performance coming?

To check it out, let's first see how consistent something like the percentage of strikeouts swinging (S% for short) is from year to year. Unfortunately, reliable play-by-play data doesn't always include accurate pitch-by-pitch information, so we'll only have 2003-2005 data to use. Obviously, 2005 is far too young to use when determining the consistency of a stat from year to year, so we'll have to settle for two consecutive seasons of data as a first pass. As more accurate data going backwards is available, we'll be able to add more confidence to these findings, but with limited data, the r-squared from 2003 to 2004 of S% is .3022. That's not entirely insignificant, falling just below stats like BB/9 and OBP in terms of statistical consistency.

Given that S% is somewhat consistent from year to year, perhaps it could help us predict an imminent change in K/PA. It's certainly possible that some pitchers appear to keep up their K/PA rate--a critical stat for predicting pitcher success--with a few more favorable umpire calls on third strikes rather than missing bats. To see if that's the case, a multivariable regression using each player's K/PA and S% in 2003 against K/PA in 2004 should give us an idea if that's the case.

Unfortunately, the previous year's K/PA dominates S% in the regression analysis, accounting for 59.67% of the variation while S% manages only a meager 2.49%. It's not quite Royals-Yankees or Koror-Ulong, but it's close. Given the dominating determining factor of the previous year's K/PA rate in predicting K/PA, S% doesn't yield any significant predictive value when looking for an edge in predicting pitcher breakout or decline in terms of K/PA. It's certainly possible that with more years of data available, a more discernable trend could be found, but with regards to predicting K/PA changes the following season, a whiff is a whiff no matter how you can get it.

Related Content:  Data Use,  Strikeouts

0 comments have been left for this article.

<< Previous Article
Premium Article Under The Knife: Deer ... (06/10)
<< Previous Column
Premium Article Crooked Numbers: Eight... (06/02)
Next Column >>
Premium Article Crooked Numbers: The B... (06/16)
Next Article >>
Fantasy Article Fantasy Focus: Weekend... (06/11)

RECENTLY AT BASEBALL PROSPECTUS
Playoff Prospectus: Come Undone
BP En Espanol: Previa de la NLCS: Cubs vs. D...
Playoff Prospectus: How Did This Team Get Ma...
Playoff Prospectus: Too Slow, Too Late
Premium Article Playoff Prospectus: PECOTA Odds and ALCS Gam...
Premium Article Playoff Prospectus: PECOTA Odds and NLCS Gam...
Playoff Prospectus: NLCS Preview: Cubs vs. D...

MORE FROM JUNE 10, 2005
Premium Article Under The Knife: Deer 1, Rockies 0
Premium Article Prospectus Matchups: Claiming the Flag
Fantasy Article Fantasy Focus: RFK Park Effects, So Far
Prospectus Notebook: Friday Edition

MORE BY JAMES CLICK
2005-06-30 - Premium Article Crooked Numbers: Left Wing Conspiracy
2005-06-23 - Premium Article Crooked Numbers: Seeing Red
2005-06-16 - Premium Article Crooked Numbers: The Bronx Defense
2005-06-10 - Premium Article Crooked Numbers: Protons:Electrons::Swinging...
2005-06-02 - Premium Article Crooked Numbers: Eight Is Enough
2005-05-26 - Premium Article Crooked Numbers: Saving for Another Day
2005-05-19 - Premium Article Crooked Numbers: Getting Bigger All the Time
More...

MORE CROOKED NUMBERS
2005-06-30 - Premium Article Crooked Numbers: Left Wing Conspiracy
2005-06-23 - Premium Article Crooked Numbers: Seeing Red
2005-06-16 - Premium Article Crooked Numbers: The Bronx Defense
2005-06-10 - Premium Article Crooked Numbers: Protons:Electrons::Swinging...
2005-06-02 - Premium Article Crooked Numbers: Eight Is Enough
2005-05-26 - Premium Article Crooked Numbers: Saving for Another Day
2005-05-19 - Premium Article Crooked Numbers: Getting Bigger All the Time
More...