<< Previous Article

Purpose Pitches: NL NR... (02/24)

<< Previous Column
Spitballing: Always in... (02/17)

Next Column >>
Spitballing: Fourth Ti... (03/03)

Next Article >>
Fantasy Focus: Right F... (02/24)

February 24, 2011

Spitballing

Playing with Playing Time

by Jeremy Greenhouse

Printer-friendly

Contact Author

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Subscribe for $4.95 per month
Recurring subscription - cancel anytime.

Purchase a $39.95 gift subscription
a 33% savings over the monthly price!

Already a subscriber? Click here and use the blue login bar to log in.

Projecting playing time is hard. When news hit that Adam Wainwright was almost certainly out for the year, forecasters went into fits. There’s no way to predict Tommy John surgery* for a pitcher coming off back-to-back 230-inning seasons. So you can say there’s a five percent chance he gets hurt and dismiss that possibility as too unlikely to weigh into your projection, or you can drag your forecast down by five percent. Regardless of which course of action is better, the only recourse after the fact is to “cheat” by manually updating the number of projected innings when such news comes out.

*Wait, he had an inverted W? Well, in that case...

PECOTA, Dan Szymborski’s ZiPS, and Brian Cartwright’s Oliver all use manually adjusted depth charts. PECOTA also uses a simple average of past seasons for individual projections. Victor Wang tried his hand at projecting playing time using some more advanced techniques. But for now, I’d like to keep it simple.

Marcel the Monkey, the world’s most baseliniest projection system, projects Wainwright at a 2.98 ERA in 198 innings. Marcel was developed by Tom Tango to serve as a replacement-level forecasting system, as its methodology is entirely open source. Marcel is quite sophisticated in some ways, as it understands regression to the mean, handles the weighting of prior seasons, and includes an aging curve. While I wouldn’t trust myself to guess a random pitcher’s ERA over a projection system without using a computer of my own, I’m positive that I can do better in terms of innings pitched. I’m also positive that I can come up with an algorithm just as basic as the one that Marcel uses to better predict playing time on average.

Marcel starts by giving all batters 200 plate appearances. It then adds half of a batter’s plate appearances from the previous year and 0.1 times his total from two years prior. For pitchers, it uses the same weights, but with 60 innings for starters and 25 for relievers as the constants. I decided to use a similar dataset and see how easily I could beat the monkey with a couple of simple rules of my own.

My sample consisted of all regular-season plate appearances in year N since 1980 for everyone who played in the Majors in either year N-1 or year N-2 and played pro ball in some year thereafter.

Marcel’s equation explains 63 percent of the variance in a batter’s plate appearances in a given year. When I best-fit the data, the regression equation took 75 percent of the PAs in year N-1 in addition to 10 percent in year N-2. But we can do better than the best-fit line with one simple separation of the data. When projecting playing time in season N for players who played more in season N-1 than in season N-2, the playing time in season N-2 becomes irrelevant. For batters, that means that you project with 80 percent of the previous year’s plate appearances, and for pitchers it’s closer to 75 percent. Otherwise, the equation is 60 percent plus 20 percent for batters and 60 percent plus 15 percent for pitchers, where the first term is the PAs in year N-1 and the second term is year N-2.

The r-squared, however, was identical at 0.64. What this means is that there are multiple ways to skin this cat. (Are there actually multiple ways to skin a cat? Cat-skinners, get at me.)

So which way is more correct? Below I plot the best-fit lines for batters and pitchers and the games played per games projected indexes.

I submit that it is more sensible to force the intercept to zero, since if someone hasn’t played in the Majors in either of the past two years, he probably won’t play the following year. Indeed, for players projected at fewer than 300 PAs by Marcel, Marcel generally overshoots by 180 PAs, while the just-as-simple best-fit hereinafter referred to as Marcello, Marcel’s Italian twin* is on average within 10 PAs. A quarter of these players don’t play at all in the projected season, yet Marcel is projecting 95 percent of them for a career high in plate appearances. When Marcel projects for over 300 PAs, it misses by an average of 100 PAs, compared to an average of two PAs for Marcello. Marcello projects 20 fewer innings for a workhorse like Wainwright. There are two hitters and one pitcher Marcello projects for more plate appearances than Marcel: Rickie Weeks, Austin Jackson, and C.J. Wilson.

*I’m not sure why their parents named them that way, or how they can be of different nationalities.

This works back to the original uncertainty surrounding the question that a projection system is attempting to answer. Yes, it’s more likely that someone like Wainwright will pitch 200 innings than 180. For pitchers who throw 200 innings in back-to-back years, there’s a better than 50-50 chance that they have a third season at 200 innings-plus. However, the mean innings pitched for those players turns out to be way less than 200.

The Bill James forecasts are notoriously optimistic because they shoot for the mode as opposed to a mean. At the other end is Marcel, which regresses everything towards league average to such an extent that it is resistant to outliers. Marcel is trying to provide a true talent estimate, and therefore trying to minimize the error around projected production level. I think that similarly minimizing the error around projected playing time for a system like Marcel makes more sense than projecting playing time based on a different set of circumstances in which the player has likely outperformed his projected production. Of course, when it comes to projecting playing time, it’s unlikely that even the best algorithm supplied with the best data could outperform old-fashioned flesh, blood, and brainpower.

Jeremy Greenhouse is an author of Baseball Prospectus.
Click here to see Jeremy's other articles. You can contact Jeremy by clicking here

11 comments have been left for this article.

BP Comment Quick Links

code of conduct

levidavis

(35042)

I've never understood why an inverted W is not just called an M.

Feb 24, 2011 07:27 AM

link

rating: 1

adamsternum

(53972)

I've never understood why the inverted W is not just called the W. How do you know where I'm standing?

Feb 24, 2011 12:55 PM

link

rating: 2

Jeremy Greenhouse

BP staff

(59911)

Clearly you both have a lot to learn about pitching mechanics.

Feb 24, 2011 15:24 PM

link

Mike Fast

BP staff

(4387)

Good stuff, Jeremy.

Feb 24, 2011 07:39 AM

link

TangoTiger

(57181)

Nice work Jeremy. I posted a reply on my blog. Thanks for giving me more to think about.

Feb 24, 2011 13:12 PM

link

rating: 0

Mike Fast

BP staff

(4387)

Jeremy, you addressed the median and the mode in the text, but as I stare at the graphs, I find myself wondering if there is a simple way to draw a line that goes through both the dark cluster around the origin and the dark cluster of full playing time in the upper right. A mathematical technique, I mean. The "best fit" line is having to account for all the points along the x-axis.

Feb 24, 2011 13:27 PM

link

Jeremy Greenhouse

BP staff

(59911)

Mike, right, the mode method would go through both of those clusters. I believe that would entail simply using the exact same number of opportunities from one year to the next.

Feb 24, 2011 15:27 PM

link

Ben Murphy

BP staff

(4214)

Wouldn't the cluster in the lower left (for pitchers) be the relievers (along with the injuries and such)? And so if you separated by pitcher role, would that help? You could obviously approach the model itself from multiple angles (logit, piece-wise, two separate models completely) if you had some type of logit flag for GS > G*0.8 or something. Or is that more complicated than what you wanted to do?

Anyway it would be interesting to see how the plot looks with pitchers separated into two groups with some simple rule of thumb.

Interesting, stuff, though. Thanks Jeremy.

Feb 26, 2011 09:45 AM

link

Jeremy Greenhouse

BP staff

(59911)

Yes, I think games per game started would be a good variable in a more sophisticated model. I might revisit projected playing time more toward opening day.

Feb 26, 2011 16:53 PM

link

Jeremy Greenhouse

BP staff

(59911)

I should have tested using simply year n-1 as a predictor of year N. I also should have used the absolute error or RMSE or something to have tested. I will try to get to those later.

Feb 24, 2011 15:56 PM

link

Jeremy Greenhouse

BP staff

(59911)

OK, I checked it out, and I think I've made it clear I don't feel Marcel is competent in projecting playing time. Using simply year n-1 to predict year N obviously comes in with a higher average error than Marcello, as Marcello is simply a best fit and year n-1 is unregressed. But year n-1 and Marcello have practically the same average absolute errors. Marcel has a much larger average absolute error.

Feb 26, 2011 17:07 PM

link

You must be a Premium subscriber or have a Fantasy subscription to post a comment.
Not a subscriber? Sign up today!

<< Previous Article

Purpose Pitches: NL NR... (02/24)

<< Previous Column
Spitballing: Always in... (02/17)

Next Column >>
Spitballing: Fourth Ti... (03/03)

Next Article >>
Fantasy Focus: Right F... (02/24)

RECENTLY AT BASEBALL PROSPECTUS Playoff Prospectus: Come Undone BP En Espanol: Previa de la NLCS: Cubs vs. D... Playoff Prospectus: How Did This Team Get Ma... Playoff Prospectus: Too Slow, Too Late Playoff Prospectus: PECOTA Odds and ALCS Gam... Playoff Prospectus: PECOTA Odds and NLCS Gam... Playoff Prospectus: NLCS Preview: Cubs vs. D...	MORE FROM FEBRUARY 24, 2011 Manufactured Runs: PS Odds, I Love You Overthinking It: The Worst of the Best Fantasy Focus: Right Fielder Rankings Purpose Pitches: NL NRIs of Note Fantasy Beat: Value Picks in the Bullpen Painting the Black: A Measure of Success
MORE BY JEREMY GREENHOUSE 2011-03-31 - Spitballing: Repeat Business 2011-03-18 - Between The Numbers: The Rule X Draft 2011-03-03 - Spitballing: Fourth Time's the Harm 2011-02-24 - Spitballing: Playing with Playing Time 2011-02-17 - Spitballing: Always in Motion is the Future? 2011-02-10 - Spitballing: Welcome to The Show 2009-05-17 - Prospectus Idol Entry: Jeremy Greenhouse's I... More...	MORE SPITBALLING 2011-04-07 - Spitballing: Trading Places 2011-03-31 - Spitballing: Repeat Business 2011-03-03 - Spitballing: Fourth Time's the Harm 2011-02-24 - Spitballing: Playing with Playing Time 2011-02-17 - Spitballing: Always in Motion is the Future? 2011-02-10 - Spitballing: Welcome to The Show More...