Lies, Damned Lies: PECOTA Takes on Prospects, Introduction

Last year, we ran our first-ever series of PECOTA-based prospect rankings. This wasn’t necessarily intended to be an annual feature, but it proved to generate a lot of good discussion, so here we are again.

Are the PECOTA rankings intended to be a substitute for scouting-based prospect lists? Emphatically not. As much fun as it is to play up the scouts-versus-stats angle, I don’t expect the PECOTA rankings to be as accurate as Kevin Goldstein’s Top 100, which you will see in Baseball Prospectus 2007, or the rankings you might get from Baseball America or John Sickels. From what I can gather about how the PECOTA list performed in 2006, it had its share of successes–PECOTA had Ryan Zimmerman, Philip Hughes, and Ian Kinsler rated higher than almost anyone else, for example–but the scouting-based lists did a little bit better overall, particularly when it came to pitching prospects. As I said when this question came up in a recent chat:

Remember, Kevin can take statistical information into account as he pleases, but PECOTA can't take scouting information into account. PECOTA thinks that Kevin Slowey is a hair better than Homer Bailey. I don't agree with that, and if you could feed PECOTA information about their respective fastball velocities and so forth, I don't think PECOTA would agree with that, either.

The fuel of any ranking system is information–and being able to look at both scouting and statistical information means that you have more fuel. Really, the only way that a purely stat-based prospect list should be able to beat a hybrid list is if the biases introduced by the process are so strong that they overwhelm the benefit. There are a couple kinds of biases to worry about.

The first comes from overweighting the importance of certain kinds of scouting information. It’s clear that some kinds of scouting information do have a substantial predictive benefit; some of these (body type, speed) we are able to capture reasonably well in PECOTA, while others (fastball velocity) we are not. But there are still a few categories of players that are routinely overrated by scouts: pitchers that throw hard but aren’t able to miss bats, or ex-football players who have yet to leverage their tools into results. For what it’s worth, I suspect that these kinds of biases are much less prevalent than they used to be; everyone involved in the prospect game has gotten much smarter about developing their information filters. Being able to account for scouting information is worth the occasional misplaced infatuation with a Mark Rogers or a George Lombard several times over.

The other kind of bias is the failure to synthesize statistical information in an appropriate fashion. Perhaps this shouldn’t really be called ‘bias,’ since it isn’t introduced intentionally. But think about all the different sorts of things that you need to account for in order to translate a prospect’s statistical record into some projection of his future value:

His age, relative to his leagues;
How different types of statistical attributes develop with age;
League difficulty effects;
Park effects;
The comparative values of different positions, accounting for positional defense;
The predictive value of different types of batting and pitching statistics (e.g. filtering out noise in the data, like extreme BABIPs for pitchers);
The relationship of component statistics and overall value, as measured by wins or runs;
The predictive value of the most recent season’s worth of statistics versus previous seasons;
The predictive value of major league versus minor league statistics;
How similar players have developed historically;
How valuable a given performance would be at the major league level, involving some knowledge of major league economics;
Evaluating risk versus ‘upside’;
Injury and attrition risk.

Keeping all of this in your head at once is not easy to do. That’s where PECOTA is handy. Take something simple like the relative value of players at different positions, for example. PECOTA expects Delmon Young to turn in a .308/.354/.513 equivalent batting line at age 25, and Reid Brignac a .271/.321/.479 batting line. Each of those projections looks reasonable, and it turns out that those performances have roughly equal value after accounting for the players’ respective positions. But somehow, the Young projection feels bigger–there’s the .300+ BA and the nice slugging average–and Young rates ahead of Brignac on virtually every prospect list.

This is not to say that there aren’t reasons to prefer Young to Brignac; PECOTA itself concludes that Young is the slightly better prospect. But eyeballing minor league statistics can be dangerous. We might know that California League is an easier hitting environment than the Florida State League, but we’re probably not going to account for enough of the difference when we’re doing the math in our heads. We might know that a 19-year-old has more development time left than a 22-year-old, but we probably underestimate just how sharp the aging curve is. PECOTA is able to account for all this information objectively.

So the goal here is to provide some perspective, rather than for PECOTA to be the end-all be-all answer on prospects. With Kevin on our team this year, our prospect lists have turned a little bit more ‘scouty’–Kevin does look at the PECOTAs, but not to the same degree that we did in past years. This is perfect from my point of view, because it means that we have two relatively unadulterated perspectives to provide you with. At the end of this series of articles, I’ll present a combined Top 50 list based on synthesizing the PECOTA and Goldstein lists. I happen to think that this combined list represents the best prospect rankings that you’ll find anywhere. (Kevin probably disagrees, so we might have to put some money up at longbets.org, or just put a good, ol’-fashioned steak dinner at risk at one of Chicago’s finer establishments.)

Methodology

Ch-ch-ch-ch-CHAN-ges

Prior to last season, we made a large overhaul of PECOTA, introducing things like the detailed five-year forecasts, the “level” adjustment for minor league players (PECOTA tries to compare Double-A hitters against other Double-A hitters), enhanced component park factors, and the starter/reliever adjustment. This year’s changes are comparatively minor, so minor that most are barely worth discussing. The level adjustment, for example, now recognizes that the gap between different levels isn’t always equal; there’s a larger difference in competition between Triple-A and the majors than between Double-A and Triple-A. Accounting for this makes PECOTA more accurate–by 0.001% or something. We’re sticklers for detail here, and don’t take a lot of shortcuts.

One more significant change is that we’re now using age-specific baseline weightings for pitchers. This means that if a pitcher is very young or very old, we tend to weight his most recent season especially heavily; if he’s in the middle of his career, we use a flatter distribution. This helps a pitcher like Yovani Gallardo, whose numbers improved a lot from 2005 to 2006, while it hurts someone like Yusmeiro Petit, who regressed significantly as he advanced levels. This change also affects pitchers on the other side of the age spectrum. It hurts Randy Johnson, for example, but helps Tom Glavine. We introduced this change for position players last year. It took a little bit longer to find the empirical basis in doing it for pitchers since pitcher data is so noisy.

There are also some improvements made possible by Clay’s hard work on the DTs. In addition to rolling 2006 into our dataset, we also retrofitted the 1997 season, which expanded the minor league database by 33%. And for the first time, we ran translations for the Gulf Coast and Arizona Leagues, the lowest rungs of the minor league ladder, which should allow us to rate almost every player who has hit with a wood bat.

All About Upside

While most of these changes tend to make the PECOTA more nuanced and complicated, we’ve decided to simplify the rating criteria for these prospect rankings themselves. Rather than combine two separate versions of a player’s Peak rating, as we did for last year’s lists, we’re simply going to focus on his Upside score.

Upside gives credit only for performance above league average at the player’s position, and zero credit for everything else. If a player winds up being a bench guy in the majors, or gets stuck at Double-A, or quits baseball to work in a lumber yard–none of these outcomes is desirable. On the other hand, the cost of employing a prospect is relatively low, both in terms of financial outlay and opportunity cost (a player can simply be left in the minors if he’s not good enough for MLB), so assigning negative points for a below-average or below-replacement level performance isn’t quite fair. Upside works around this negative value problem by giving credit for the good, while treating all different types of bad performance as having zero (but not negative) value. The version of Upside that we’re using here is the peak-adjusted variant, which measures a player’s most valuable five-year window up through and including his age 28 season (or simply his next five years of performance if he’s already age 25 or older).

I realize that all of this is a bit complicated, and I encourage you to explore the PECOTA glossary if you’re the type that likes the dirty details. But the intuition behind our methodology is fairly simple: we’re attempting to measure the degree and probability of above-average performance while the player is under the control of his parent club. This is the real fruit of the unforgiving labor of scouting and development: getting impact performances from players who are still cheap under the reserve clause, or in arbitration.

This definition is very important to keep in mind when we tell you, for example, that Dustin Pedroia is at least as valuable as Delmon Young. Young is two years younger than Pedroia, and has a more athletic profile that will likely age better into his thirties. He is probably at least a 2:1 favorite to produce more value over the course of his entire major league career. But he is not such a favorite to produce more value in the years during which he’s under club control. You can only get six years and change out of a prospect before his arbitration clock runs out, and they won’t necessarily be his best years, especially if he reaches the major leagues very young. That Delmon Young is likely to be more valuable than Dustin Pedroia when he’s 32 doesn’t matter one whit to the Red Sox or Devil Rays. Both players will be long gone from their parent systems by then, or will have had the chance to negotiate deals at market price. This is very important to understand, and it’s a point that I didn’t emphasize enough last year. I do happen to think that it’s the “right” way to value prospects, but we’ll save that discussion for another time.

As we’re using peak-adjusted Upside as our sole ranking criteria, we’ve made a couple of tweaks to increase its viability. Firstly, for position players, Upside now accounts for defensive as well as offensive performance. I realize that some of you won’t be thrilled with using minor league fielding numbers, but there are a couple of things to keep in mind:

The fielding projections jibe with scouting reports far more often than not. Cameron Maybin is projected to be an average-or-better major league center fielder, Brignac is projected to have some trouble handling shortstop.
Although fielding ratings can be hard to pin down, PECOTA accounts for this by regressing them to the mean fairly heavily.

Most importantly, If we don’t account for fielding, then we run into problems with players whose defense is grossly inadequate for their positions, and will almost certainly wind up playing elsewhere if and when they hit the majors. Preston Mattingly, for example, produced -7 fielding runs in just 30 games at shortstop in his debut; it’s obvious from both a scouting and statistical perspective that he’s going to end up at an infield corner. If we treated Mattingly as a shortstop for Upside purposes without accounting for his defense, we’d wind up vastly overrating him versus players that can legitimately handle the position.

The second tweak is that we fudge a bit on the definition of ‘average’ for pitchers, giving starting pitchers credit toward their Upside score beginning with ERAs slightly below league average, while requiring somewhat above-average performance from relief pitchers. The rationale for this stems from the research I did for the starter-reliever adjustment, which reveals that a typical pitcher can expect to have an ERA about 25% higher if he pitches as a starter instead of out of the bullpen. The difference between leaguewide ERAs for starters and relievers is somewhat less than 25%, but this is because of selection effects: since starters pitch more innings than relievers, they tend to be a team’s better pitchers. This adjustment gives back some of the credit to starting pitchers, which in turn produces rankings that are a bit more intuitive. Indeed, you’ll find a handful of pitchers rated by PECOTA as among the very best prospects in the game, which wasn’t the case last year.

Finally, if you’re having trouble interpreting the Upside scores, you can use the following as a guide, adapted from Kevin’s categories in his team-by-team rankings.

Upside Score	Definition
100+	Excellent Prospect. One of the better prospects in baseball. Strong chance of long major league career, perhaps with several All-Star appearances. May have Hall of Fame potential, especially if prospect is young or has a rating of 150 or higher.
50-100	Very Good Prospect. Strong chance of a meaningful major league career, with some legitimate chance at stardom. Best-case outcomes may involve some Hall of Fame potential.
25-50	Good Prospect. Reasonable chance of a meaningful major league career, but only an outside chance at stardom.
10-25	Average Prospect. Some chance of a meaningful major league career, but more likely to end up on the major league fringe. Highly unlikely to make two or more All-Star appearances.
0-10	Marginal Prospect. Very little chance of becoming a major league regular, excluding extreme mitigating circumstances affecting the player’s statistical record.

Eligibility

Players are eligible for the PECOTA prospect lists provided that they are considered a rookie by major league standards (no more than 130 career MLB at-bats, 50 career IP, or 45 days on an active 25-man roster), with two exceptions:

We will take a mulligan on all players with fewer than 100 professional plate appearances (position players), or 100 opposing batters faced (pitchers). PECOTA is pretty smart about adjusting for limited playing time, boiling down to the essentials of a player’s skills set, and regressing to the mean as appropriate. But some players with very limited sample sizes (Jeff Bianchi, for example) simply trip up the system, and their ratings cannot be taken at face value. There are relatively few players affected by this requirement this year. Andrew Miller and Luke Hochevar are probably the most notable, and we’ll simply default to Kevin’s rankings for them when we provide the combined scores.
Japanese imports are not eligible. I’m in 100% agreement with Kevin on this issue. Japanese players might be rookies, but they aren’t prospects in the traditional sense, in that their scouting and development has occurred overseas. Moreover, the reason they haven’t hit the major leagues sooner isn’t because of their ability levels, but because Japan (understandably) imposes artificial barriers to major league entry. I’ll let you know where the three significant Japanese imports of this winter (Daisuke Matsuzaka, Kei Igawa, and Akinori Iwamura) would rank, but they’ll remain as footnotes to this process.

I also considered capping the age for prospects at 25, in order to weed out the Brooks Conrad of the world. But remember what these lists are evaluating: a player’s potential for above-average performance during his club-controlled seasons. While players like Conrad have essentially no chance of reaching the Hall of Fame or having their number retired, their arbitration clocks have yet to start ticking, and they can still produce a high ROI for their clubs as late-debut players, as Matt Stairs did for the A’s or Craig Monroe is doing for the Tigers. Instead, what we’ll do is run separate lists of the top 25-and-under players (regardless of rookie status), as we did in this series last year and as Kevin has done in his team-by-team rankings.

Players are listed at their position as determined by PECOTA, which is in turn based on the position at which they’ve accumulated the most playing time over the course of the past season or two. We will not try and guess at which prospects will switch positions, though of course the DT fielding ratings will provide some hints. Note that we are able to account for players who have split time between different positions. Dustin Pedroia is listed as a second baseman for the purposes of these rankings, since he’s accumulated a little more playing time at the keystone, but his valuation scores consider him more of a 2B/SS hybrid.

I hate to have all this buildup without any payoff, but it’s important to explain what we’re doing here. Besides, you won’t have to wait long. We'll start with the catcher rankings tomorrow.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Lies, Damned Lies: PECOTA Takes on Prospects, Introduction

Thank you for reading

Latest Articles

The Stash List ’24: Week Four $

Box Score Banter: No Exit B

MLU: Triantos Tries on Some Power $

Speed, Spin, and Snap $

Pat Murphy, Wade Miley, and the Ship of Theseus $

Nate Silver

Latest Articles

The Stash List ’24: Week Four $

Box Score Banter: No Exit B

MLU: Triantos Tries on Some Power $