CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here to subscribe
<< Previous Article
The BP Wayback Machine... (11/08)
<< Previous Column
Baseball ProGUESTus: H... (11/04)
Next Column >>
Baseball ProGUESTus: T... (11/11)
Next Article >>
Premium Article On the Beat: The 2011 ... (11/08)

November 8, 2011

Baseball ProGUESTus

Getting Explicit with Sample Sizes

by Matt Lentzner

Believe it or not, most of our writers didn't enter the world sporting an @baseballprospectus.com address; with a few exceptions, they started out somewhere else. In an effort to up your reading pleasure while tipping our caps to some of the most illuminating work being done elsewhere on the internet, we'll be yielding the stage once a week to the best and brightest baseball writers, researchers and thinkers from outside of the BP umbrella. If you'd like to nominate a guest contributor (including yourself), please drop us a line.

Matt Lentzner has carved out a (very) small niche in the baseball analysis world by examining the intersection of physics and biomechanics. He has presented at the PITCHf/x conference in each of the last two years and has written articles for The Hardball Times, as well as a previous articles for Baseball Prospectus. When he’s not writing, Matt works on his physics-based baseball simulator, which is so awesome and all-encompassing that it will likely never actually be finished, though it does provide the inspiration for most of his articles and presentations. In real life, he’s an IT Director at a small financial consulting company in the Silicon Valley and also runs a physical training gym in his backyard on the weekends.

Yes, the title sounds vaguely pornographic, but what I’m getting at is pretty serious. Well, as serious as one can get when talking about baseball stats, anyway.  Let me regale you with a story that is rehashed almost daily in baseball articles. It should sound familiar.

I was reading an article by Dan Lependorf, a very sharp sabermetrically inclined writer for Athletics Nation. It was about Michael Choice, a highly touted prospect with the Oakland A’s. Choice turned in a solid performance in High-A this season and has looked like a monster in the Arizona Fall League.

A+ Stockton (2011): .285/.376/.542
AFL Phoenix (2011): .333/.424/.745

I haven’t been keeping up with the AFL that closely, so being the stats-oriented guy that I am, when I saw the numbers I was thinking, “Yeah, that’s pretty awesome, but how many PA’s (plate appearances) is that?”

And right at that moment it hit me.

If you want to call yourself a sabermetrician, then you know that a statistic doesn’t mean anything without a sufficient sample size. Batting .400 over 10 plate appearances doesn’t mean anything. Batting .400 over a season is something amazing. We all intuitively know that.

But here’s what I think needs to happen: every stat printed should have the sample size as PAs included. Always. This is what communicates the credibility of your number. So now Michael Choice’s lines look like this:

A+ Stockton (2011): .285/.376/.542 [542]
AFL Phoenix (2011): .333/.424/.745 [62]

We know from Russell Carleton’s (AKA Pizza Cutter) seminal work (gory details here) on sample size stability that OBP and SLG stabilize at 500 PAs. In other words, the performance that Choice logged in Stockton was worth considering, while the fireworks he’s been displaying in Phoenix are not very meaningful at all (12 percent of the required PAs). In fact, there are 11 other hitters bopping with an OPS over 1.000 at this time, and Choice is in the middle of the pack. It’s not even a special performance.

But there’s more to this story.

One of the knocks on Choice is his high strikeout numbers, the worry being that he will strike out too often to be an effective major-league hitter. He already strikes out a lot in High-A, and his strikeout rate is expected to rise as he faces better pitching in the upper levels. Here’s the strikeout performance he’s put on in Stockton and Phoenix:

A+ Stockton (2011): 24.7 K% [542]
AFL Phoenix (2011): 15.5 K% [62]

Much, much better in Phoenix, but with that small sample size, is it meaningful? As it turns out, strikeout rate stabilizes a lot faster than OBP and SLG—in only 150 PA. Check this out:

AFL Phoenix (2011): 15.5 K% [62/150]

This is much more meaningful. He’s over 40 percent of the way to a “real” number. Assuming Choice plays most of the games remaining on the schedule, he should pass 100 PAs. Although he won’t reach 150, whatever that K rate ends up being will be a heck of a lot more reliable that anything his OBP and SLG have to say. Even better would be a measurement of his contact rate, which stabilizes in only 100 PA. That’s a stat that is stable in the context of AFL baseball’s short schedule.

Taking the above into account, I’d like to amend my original statement. Sample size in PAs should always be printed. The assumed stable sample size should be 500 PAs. Otherwise, the sample should appear in a format of [sample/stability number] format.

Here are the stable sample sizes per Mr. Carleton’s original work on the subject:

 50 PA: Swing %
100 PA: Contact Rate
150 PA: Strikeout Rate, Line Drive Rate, Pitches/PA
200 PA: Walk Rate, Groundball Rate, GB/FB
250 PA: Flyball Rate
300 PA: Home Run Rate, HR/FB
500 PA: OBP, SLG, OPS, 1B Rate, Popup Rate
550 PA: ISO

The warm summer months seem far away now, but before long we will have bid goodbye to the Hot Stove League and moved on to spring training and the regular season schedule. Hopefully, we won’t see too many “triple-S” or “SSS” (Small Sample Size) warnings when next season rolls around. They won’t be necessary, since the samples will be explicitly stated. And maybe we can discourage people from using stats incorrectly to try to find meaning where very little exists.

Related Content:  Michael Choice,  Phoenix,  Small Sample Size,  AFL,  Sample Size

14 comments have been left for this article.

<< Previous Article
The BP Wayback Machine... (11/08)
<< Previous Column
Baseball ProGUESTus: H... (11/04)
Next Column >>
Baseball ProGUESTus: T... (11/11)
Next Article >>
Premium Article On the Beat: The 2011 ... (11/08)

RECENTLY AT BASEBALL PROSPECTUS
Playoff Prospectus: Come Undone
BP En Espanol: Previa de la NLCS: Cubs vs. D...
Playoff Prospectus: How Did This Team Get Ma...
Playoff Prospectus: Too Slow, Too Late
Premium Article Playoff Prospectus: PECOTA Odds and ALCS Gam...
Premium Article Playoff Prospectus: PECOTA Odds and NLCS Gam...
Playoff Prospectus: NLCS Preview: Cubs vs. D...

MORE FROM NOVEMBER 8, 2011
Premium Article The BP Broadside: Tumbling in the Twin Citie...
Premium Article On the Beat: The 2011 All-MLB Team
The BP Wayback Machine: When Good GMs Go Bad

MORE BY MATT LENTZNER
2012-06-19 - Baseball ProGUESTus: See the Ball, Hit the B...
2011-11-08 - Baseball ProGUESTus: Getting Explicit with S...
2011-09-30 - Baseball ProGUESTus: A New Take on Plate Dis...
2011-03-18 - Baseball ProGUESTus: Looking at Pitches Thro...
More...

MORE BASEBALL PROGUESTUS
2011-11-23 - Baseball ProGUESTus: The Best Bush League Ba...
2011-11-18 - Baseball ProGUESTus: Why Having a Quick Hook...
2011-11-11 - Baseball ProGUESTus: The Language of the Hot...
2011-11-08 - Baseball ProGUESTus: Getting Explicit with S...
2011-11-04 - Baseball ProGUESTus: Hard Truths at Triple-A
2011-10-31 - Baseball ProGUESTus: Silly Goose: Mariano Ri...
2011-10-20 - Baseball ProGUESTus: A League of Their Own?
More...