Glossary: PECOTA
View Glossary Entries by
The 10-Year Forecast is a player's weighted mean PECOTA forecast, taken over his next 10 seasons.
The process for generating a player's weighted mean line for a season some number of years into the future (e.g. 2014) is fundamentally identical to generating his forecast for the season immediately upcoming (e.g. 2012). The exception is that some players may have dropped out of the comparables database, in which case their performance cannot be considered. (See also
Jeremy Giambi Effect).
Note that the Ten-Year Forecast assumes that a player's team context remains the same for all years of the forecast.
The percent chance that a player's playing opportunities will decrease by at least 50 percent relative to his baseline playing time forecast. For hitters, playing opportunities are measured by plate appearances and for pitchers, they are measured by opposing batters faced.
Although it is generally a good indicator of the risk of injury, Attrition Rate will also capture seasons in which playing time decreases due to poor performance or managerial decisions.
The percent chance that a player’s production (measured by RA for pitchers, and True Average for hitters) will improve by at least 20 percent relative to the weighted average of his performance over his most recent seasons.
Every player of the same age in our database (historically) is assigned a similarity score to the player in question. For each such similar player, a "baseline" value is projected for runs allowed based on past performance and standard aging curves. The weighted average of "1" or "0" for each of these similar players is taken, where "1" is used each time a comp player accomplished the goal in question—improving his run prevention or True Average by 20 percent.
Collapse Rate can sometimes be
counterintuitive for players who have already experienced a
radical change in their performance levels.
It's also important to note that established major leaguers
are compared to other major leaguers only, while minor-league
players may be compared to major-league or minor-league
players, with PECOTA strongly preferring the latter. All
comparables represent a snapshot of how the listed player was
performing at the same age as the current player, so if a 23-
year-old hitter is compared to Miguel Tejada, he's actually
being compared to a 23-year-old Tejada, not the decrepit
Giants version of Tejada, nor to Tejada's career as a whole.
Diagnostics are a series of metrics designed to estimate the probability of certain types of changes in production and playing time; see the individual entries for additional detail.
Fair Run Average differs from FIP in a few ways. While FIP is concerned only with what a pitcher is believed to control-typically strikeouts, walks, and home runs, though Prospectus includes hit batsmen in our FIP calculation-Fair Run Average takes things a step further. Pitchers receive credit for good sequencing, thus rewarding pitchers who seem to work out of jams more often than usual. Fair Run Average also considers batted ball distribution, base-out state, and team defensive quality (as measured by Fielding Runs Above Average).
Here is an example of the Fair Run Average spectrum based on the 2011 season:
Excellent - Clayton Kershaw 2.90
Great - Brandon McCarthy 3.42
Average - Ivan Nova 4.36
Poor - Brett Cecil 5.14
Horrendous - Jake Arrieta 5.88
A player who is expected to perform just the
same as he has in the recent past will have an Improve Rate of
50 percent.
The Jeremy Giambi Effect is a name given to the correlation between playing time and quality of performance. The Jeremy Giambi Effect has important implications for understanding a player's PECOTA forecast.
Following are Giambi's plate appearances and OPS for each year of his major league career
Year PA OPS
1998 70 .739
1999 336 .741
2000 302 .761
2001 443 .841
2002 397 .919
2003 156 .696
Note that the correlation between Giambi's PA and OPS is very strong (r=.72). He played more often when he played more effectively, and less so when he played less effectively. Eventually, his performance became so poor that he could no longer secure any major league playing time at all.
MLB% is the percentage of the comps who played in MLB the following season
PEAK refers to a series of metrics designed to evaluate a player's value in some statistic - most often WARP or non-negative WARP (used by UPSIDE calculations) - over a series of consecutive seasons. It has had two variations. The one currently in use for UPSIDE on the player cards is the five-year variant referenced by Nate Silver:
The version of Upside that we’re using here is the peak-adjusted variant, which measures a player’s most valuable five-year window up through and including his age 28 season (or simply his next five years of performance if he’s already age 25 or older).
-- Nate Silver, 2007
Also used in some writings simply uses the next six seasons of a player's career.
In both cases, seasons which have yet to be played are projected using PECOTA instead of ignored, so young players will have the full complement of five (or six) seasons of data. See also: UPSIDE.
Stands for Player Empirical Comparison and Optimization Test Algorithm. PECOTA is BP's proprietary system that projects player performance based on comparison with historical player-seasons. There are three elements to PECOTA:
1) Major-league equivalencies, to allow us to use minor-league stats to project how a player will perform in the majors;
2) Baseline forecasts, which use weighted averages and regression to the mean to produce an estimate of a player's true talent level;
3) A career-path adjustment, which incorporates information about how comparable players' stats changed over time.
Check out the PECOTA section of the glossary for more on the system's intricacies.
The Percentile Forecast is a representation of the player's expected performance in the upcoming season at various levels of probability.
For example, if a pitcher's 75th percentile ERA forecast is 3.50, this indicates that he has a 75 percent chance to post an ERA of 3.50 or higher, and a 25 percent chance to post an ERA lower than 3.50. Higher percentiles indicate more favorable outcomes.
The Percentile Forecast is calibrated off two key statistics: TAv for hitters, and ERA for pitchers (although the ERA is a component ERA, and thus will not reflect the variance of sequencing in a player's performance).
PECOTA runs a series of regressions within the set of comparable data in order to estimate how changes in peripheral statistics are related to changes in equivalent runs. For example, if it first estimates that Carl Crawford will produce a .290 TAv next year, it then tries to determine what home run total, walk total, and so on are most likely to be associated with a .290 TAv season.
PECOTA then iterates this result to ensure that the peripheral statistics 'add up' to the right calibrating statistic (TAv or ERA). It is important to note that the Percentile Forecast is designed to work around the calibrating statistic only.
A player's forecast is adjusted to the park and league context associated with the team listed at the top of the forecast page. Team dependant stats like Wins, RBIs, and BABIP account for the projected performance level of a player's teammates
PECOTA forecasts playing time (plate appearances) in addition to a player's rate statistics. These forecasts are based on a player's previous record of performance, and the comparable player data, and do not incorporate any additional information about managerial decisions.
True Average incorporates aspects that other linear weights-based metrics ignore. Reaching base on an error and situational hitting are included; meanwhile, strikeouts and bunts are treated as slightly more and less damaging outs than normal. The baseline for an average player is not meant to portray what a typical player has done, but rather what a typical player would do if given similar opportunities. That means adjustments made for parks and league quality. True Average's adjustments go beyond applying a blanket modifier-players who play more home games than road games will see that reflected in their adjustments. Unlike its predecessor, Equivalent Average, True Average does not consider baserunning or basestealing.
Here is an example of the True Average spectrum based upon the 2009-2011 seasons:
Excellent - Miguel Cabrera .342
Great - Alex Rodriguez .300
Average - Austin Jackson .260
Poor - Ronny Cedeno .228
Horrendous - Brandon Wood .192
-------------------------------------------
See: http://www.baseballprospectus.com/article.php?articleid=11717
0.9 (from the article) is no longer a stationary number, but a scale based on current season runs. It's all the way up to almost 1.07 now, due to run scoring being so much lower than when Colin wrote this (from the link above):
From 1993 to 2009, you can figure TAv simply as:
0.260 + (RAA/PA)*.9
Now, we will be tuning those values slightly to match the batting average for that season, but other than that, that’s the formula for TAv we will be using once the new stat reports are rolled out.
[...]
All that matters essentially is the computation of the initial R/PA values. When people ask about wOBA, most of the time what they really care about is the values presented on Fangraphs, derived from this set of linear weights developed by Tom Tango.
UPSIDE is determined by evaluating the performance of a player's top-20 PECOTA comparables. If a comparable player turned in a performance better than league average, including both his batting and fielding performance, then his wins above average (WARP minus replacement value) are counted toward his UPSIDE. A base of two times wins above average is used for position players, and an adjustment is made to pitcher values such that they are comparable. If the player was worse than average in a given season, or he dropped out of the database, the performance is counted as zero.
Because it is far easier for a player like Ugueto to improve upon his production by 20 percent than it is for Alex Rodriguez; as a result, Ugueto's Breakout score is likely to be higher. This does not mean that Ugueto is a player you'd want anywhere near your roster.
Value Over Replacement Player. The number of runs contributed beyond what a replacement-level player at the same position would contribute if given the same percentage of team plate appearances. VORP scores do not consider the quality of a player's defense.
Here is an example of the Value Over Replacement Player spectrum based on the 2011 season:
Excellent - Matt Kemp 95.2
Great - Robinson Cano 51.4
Average - Eric Hosmer 19.9
Poor - Derrek Lee 3.2
Horrendous - Adam Dunn -22.6
VORP for position players consists of batting runs above average (BRAA), position adjustment (POS_ADJ), baserunning runs above average (BRR - which includes - but is not limited to - stolen bases and times caught stealing ), and an adjustment for replacement level (REP_ADJ).
Perhaps no sabermetric theory is more abstract than that of the replacement-level player. Essentially, replacement-level players are of a caliber so low that they are always available in the minor leagues because the players are well below major-league average. Prospectus' definition of replacement level contends that a team full of such players would win a little over 50 games. This is a notable increase in replacement level from previous editions of Wins Above Replacement Player.
Here is an example of the Wins Above Replacement Player spectrum based on the 2011 season:
Excellent - Jose Bautista 10.3
Great - Hunter Pence 5.2
Average -Gaby Sanchez 2.0
Poor - Adam Lind 0.5
Horrendous - Adam Dunn -1.7
WARP components can be found in this article, which also describes 2015 changes to FRAA: http://www.baseballprospectus.com/article.php?articleid=27944
The Weighted Mean forecast incorporates all of the player's potential outcomes into a single average, weighted baed on projected playing time. In almost all cases, poor performances are associated with a reduced number of plate appearances. For that reason, they don't hurt a player's team quite as much as good performances help it; the weighting is designed to compensate for this effect (see also Jeremy Giambi Effect).
EXCEPTION: a player's projected PLAYING TIME (and therefore, his counting statistics that are incumbent on his playing time) is taken based on the median of his comparables' performance, rather than the weighted mean. This is designed to mitigate the influence of catastrophic injuries, which are better represented by Attrition Rate.
This exception does NOT affect a player's WARP and VORP forecast, which are calculated per the weighted mean method, treating players who dropped out of the database as having zero WARP/VORP.
|