CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here to subscribe
<< Previous Article
Premium Article Future Shock: Yankees ... (03/08)
<< Previous Column
Reintroducing PECOTA: ... (02/08)
No Next Column
Next Article >>
Fantasy Article Everyone's Perfect: Th... (03/09)

March 8, 2012

Reintroducing PECOTA

House of Cards

by Colin Wyers

Without further ado, we present to you the PECOTA cards. Debuting on the cards are the 10-year forecasts and the percentile forecasts.

The 10-year projection process breaks down like so:

  • First, a player’s performance is broken down into a series of components. These are more detailed than the official stats breakdown, featuring everything required to build a full batting (or batting against) line, including reach on error, infield single rates, etc.
  • Single-season aging curves for all of the components are built upon comparing adjacent seasons. There is a selection bias in adjacent seasons, so we use a back-weighted sample of the player’s past performance (using the same process we use in the normal PECOTA forecasts), regressed to the mean, to build the single-season aging curves.
  • We use a curve-fitting process to “smooth” the aging curve, to make sure the progression is orderly and to diminish the effect of random variance in the aging curves (especially for very young or very old players, where the pools are much smaller).
  • To come up with multi-year forecasts, we “chain” the smoothed single-year age adjustments.
  • Each player has a custom aging curve built using the comparables, and that curve is tested for reliability and then regressed to a generic aging curve based on that reliability assessment.

Peak ages will vary based upon a player’s comparables and his skill set (as each component gets its own aging curve). But typically for hitters, we see peak ages in the 10-year forecasts at around age 28, a year later than the conventional wisdom. This is somewhat offset by a decline in a player’s defensive value, which doesn’t really peak at all but starts to decline almost immediately upon his debut in the majors. (You can see the effects in this aggregation of various 10-year forecast components by age, weighted by DC playing time, here.)

Pitchers are more interesting—there seems to be an earlier peak, at age 26, for pitchers, in terms of ERA. But for pitchers who manage to survive beyond age 26, there seems to be a much later peak. Pitchers who pitch past ages 27–29 seem to peak around age 30, with some pitchers peaking even later. So pitching seems to be essentially bimodal in aging, where some pitchers peak early and others peak late. This is even true if we restrict our analysis only to pitchers who work primarily as starters their entire career. (Again, a breakdown is available here.)

Playing time in the 10-year forecasts is a reflection of what we expect a player’s playing time to be at his peak production, rather than starting with his expected 2012 playing time; this results in more sensible long-term forecasts for young prospects who aren’t quite ready for MLB yet but are expected to have productive careers when they do make it to MLB. Playing forecasts for off-peak years are then adjusted from the peak-year playing time forecast.

Now, projecting the future is more difficult the further out it goes. So how reliable are the 10-year forecasts? Looking at the root mean square error of projected 10-year True Averages for “backcasts” of historic players:

Year

  RMSE

1

0.031

2

0.032

3

0.033

4

0.035

5

0.037

6

0.039

7

0.041

8

0.043

9

0.046

10

0.050

This is about what we would expect; a player’s performance 10 years down the road is substantially more difficult to project than his performance one year down the road. But especially through the first several seasons, the reliability of the forecasts is not substantially different.

Similarly, for pitchers:

Year

RMSE

1

1.18

2

1.21

3

1.23

4

1.26

5

1.29

6

1.30

7

1.35

8

1.41

9

1.49

10

1.56

Again, results in the first several seasons are very close, with results becoming harder to project the further out you get. (Bear in mind that these are forecasts for a neutral park and league context and thus will exhibit higher RMSEs than regular PECOTA forecasts.)

The percentiles are based on three primary variables:

  • The reliability of a player’s forecast, estimated based on the playing time (weighted) that went into the forecast,
  • The projected playing time, which affects the amount of random variance expected from the forecast, and
  • The population tendencies (a player is more likely to underperform his forecast than overperform it if he is projected to be above the league average, for instance).

Again, playing time is based upon a pitcher’s expected performance—the better the performance, the more playing time we expect for that player.

Keep in mind that the percentiles key off of the primary value component—TAv for batters, and ERA for pitchers (although the ERA is a component ERA with less variance than actual ERA, as it does not account for random variance in sequencing around the component lines). Component stats are meant to illustrate the key value stats only—a pitcher’s 90th-percentile home run forecast, for instance, is not his maximum home run potential but the most likely home run total to accompany his 90th-percentile TAv.

How well do the percentiles do on historic data? Looking at back-forecasts from 1950 on, we see that 79 percent of observed TAvs fall within the 10th and 90th percentiles, and 60 percent fall between the 20th and 80th percentiles, exactly what we should expect.

As a reminder, PECOTA on the cards is restricted to subscribers only. To those of you who already subscribe, thank you for your patronage, and I hope you find the PECOTA cards useful, informative, and (employers of America, forgive me) a veritable time sink. Enjoy.


Additionally, you can look at progressions for different age groups over time, weighted by projected 2012 playing time according to the depth charts. Pitchers are located here. Hitters can be found here.

UPDATE: The ten-years forecast shows seasons two through ten of a player expected to play all ten seasons. For players whose forecast falls below the attrition rate, no forecast is displayed.

Colin Wyers is an author of Baseball Prospectus. 
Click here to see Colin's other articles. You can contact Colin by clicking here

45 comments have been left for this article.

<< Previous Article
Premium Article Future Shock: Yankees ... (03/08)
<< Previous Column
Reintroducing PECOTA: ... (02/08)
No Next Column
Next Article >>
Fantasy Article Everyone's Perfect: Th... (03/09)

RECENTLY AT BASEBALL PROSPECTUS
Playoff Prospectus: Come Undone
BP En Espanol: Previa de la NLCS: Cubs vs. D...
Playoff Prospectus: How Did This Team Get Ma...
Playoff Prospectus: Too Slow, Too Late
Premium Article Playoff Prospectus: PECOTA Odds and ALCS Gam...
Premium Article Playoff Prospectus: PECOTA Odds and NLCS Gam...
Playoff Prospectus: NLCS Preview: Cubs vs. D...

MORE FROM MARCH 8, 2012
Premium Article Prospects Will Break Your Heart: Spring Trai...
The Lineup Card: 10 Choices for 2012 Decline...
Premium Article On the Beat: Yumania
Premium Article Painting the Black: The Show Review
Fantasy Article Resident Fantasy Genius: Fantasy Tier Rankin...
Fantasy Article Preseason Value Picks: Starting Pitchers for...
The BP First Take: Thursday, March 8

MORE BY COLIN WYERS
2012-04-19 - Between The Numbers: A Funny Thing Happened ...
2012-04-04 - Manufactured Runs: Tragedy of the Commons
2012-03-23 - Extra Innings Excerpt
2012-03-08 - Reintroducing PECOTA: House of Cards
2012-02-08 - Reintroducing PECOTA: The Weighting is the H...
2012-01-10 - BP Unfiltered: Watching Jack Play
2012-01-09 - BP Unfiltered: Suspicious Minds
More...

MORE REINTRODUCING PECOTA
2012-03-08 - Reintroducing PECOTA: House of Cards
2012-02-08 - Reintroducing PECOTA: The Weighting is the H...
2011-02-07 - Reintroducing PECOTA: They're Here!
2010-10-01 - Reintroducing PECOTA: The Seven Percent Solu...
More...