February 1, 2007
Lies, Damned Lies
PECOTA Takes on Prospects, Introduction
Last year, we ran our first-ever series of PECOTA-based prospect rankings. This wasn’t necessarily intended to be an annual feature, but it proved to generate a lot of good discussion, so here we are again.
Are the PECOTA rankings intended to be a substitute for scouting-based prospect lists? Emphatically not. As much fun as it is to play up the scouts-versus-stats angle, I don’t expect the PECOTA rankings to be as accurate as Kevin Goldstein’s Top 100, which you will see in Baseball Prospectus 2007, or the rankings you might get from Baseball America or John Sickels. From what I can gather about how the PECOTA list performed in 2006, it had its share of successes--PECOTA had
Remember, Kevin can take statistical information into account as he pleases, but PECOTA can't take scouting information into account. PECOTA thinks that Kevin Slowey is a hair better than Homer Bailey. I don't agree with that, and if you could feed PECOTA information about their respective fastball velocities and so forth, I don't think PECOTA would agree with that, either.
The fuel of any ranking system is information--and being able to look at both scouting and statistical information means that you have more fuel. Really, the only way that a purely stat-based prospect list should be able to beat a hybrid list is if the biases introduced by the process are so strong that they overwhelm the benefit. There are a couple kinds of biases to worry about.
The first comes from overweighting the importance of certain kinds of scouting information. It’s clear that some kinds of scouting information do have a substantial predictive benefit; some of these (body type, speed) we are able to capture reasonably well in PECOTA, while others (fastball velocity) we are not. But there are still a few categories of players that are routinely overrated by scouts: pitchers that throw hard but aren’t able to miss bats, or ex-football players who have yet to leverage their tools into results. For what it’s worth, I suspect that these kinds of biases are much less prevalent than they used to be; everyone involved in the prospect game has gotten much smarter about developing their information filters. Being able to account for scouting information is worth the occasional misplaced infatuation with a
The other kind of bias is the failure to synthesize statistical information in an appropriate fashion. Perhaps this shouldn’t really be called ‘bias,’ since it isn’t introduced intentionally. But think about all the different sorts of things that you need to account for in order to translate a prospect’s statistical record into some projection of his future value:
Keeping all of this in your head at once is not easy to do. That’s where PECOTA is handy. Take something simple like the relative value of players at different positions, for example. PECOTA expects
This is not to say that there aren’t reasons to prefer Young to Brignac; PECOTA itself concludes that Young is the slightly better prospect. But eyeballing minor league statistics can be dangerous. We might know that California League is an easier hitting environment than the Florida State League, but we’re probably not going to account for enough of the difference when we’re doing the math in our heads. We might know that a 19-year-old has more development time left than a 22-year-old, but we probably underestimate just how sharp the aging curve is. PECOTA is able to account for all this information objectively.
So the goal here is to provide some perspective, rather than for PECOTA to be the end-all be-all answer on prospects. With Kevin on our team this year, our prospect lists have turned a little bit more ‘scouty’--Kevin does look at the PECOTAs, but not to the same degree that we did in past years. This is perfect from my point of view, because it means that we have two relatively unadulterated perspectives to provide you with. At the end of this series of articles, I’ll present a combined Top 50 list based on synthesizing the PECOTA and Goldstein lists. I happen to think that this combined list represents the best prospect rankings that you’ll find anywhere. (Kevin probably disagrees, so we might have to put some money up at longbets.org, or just put a good, ol’-fashioned steak dinner at risk at one of Chicago’s finer establishments.)
Prior to last season, we made a large overhaul of PECOTA, introducing things like the detailed five-year forecasts, the “level” adjustment for minor league players (PECOTA tries to compare Double-A hitters against other Double-A hitters), enhanced component park factors, and the starter/reliever adjustment. This year’s changes are comparatively minor, so minor that most are barely worth discussing. The level adjustment, for example, now recognizes that the gap between different levels isn’t always equal; there’s a larger difference in competition between Triple-A and the majors than between Double-A and Triple-A. Accounting for this makes PECOTA more accurate--by 0.001% or something. We’re sticklers for detail here, and don’t take a lot of shortcuts.
One more significant change is that we’re now using age-specific baseline weightings for pitchers. This means that if a pitcher is very young or very old, we tend to weight his most recent season especially heavily; if he’s in the middle of his career, we use a flatter distribution. This helps a pitcher like
There are also some improvements made possible by Clay’s hard work on the DTs. In addition to rolling 2006 into our dataset, we also retrofitted the 1997 season, which expanded the minor league database by 33%. And for the first time, we ran translations for the Gulf Coast and Arizona Leagues, the lowest rungs of the minor league ladder, which should allow us to rate almost every player who has hit with a wood bat.
All About Upside
While most of these changes tend to make the PECOTA more nuanced and complicated, we’ve decided to simplify the rating criteria for these prospect rankings themselves. Rather than combine two separate versions of a player’s Peak rating, as we did for last year’s lists, we’re simply going to focus on his Upside score.
Upside gives credit only for performance above league average at the player’s position, and zero credit for everything else. If a player winds up being a bench guy in the majors, or gets stuck at Double-A, or quits baseball to work in a lumber yard--none of these outcomes is desirable. On the other hand, the cost of employing a prospect is relatively low, both in terms of financial outlay and opportunity cost (a player can simply be left in the minors if he’s not good enough for MLB), so assigning negative points for a below-average or below-replacement level performance isn’t quite fair. Upside works around this negative value problem by giving credit for the good, while treating all different types of bad performance as having zero (but not negative) value. The version of Upside that we’re using here is the peak-adjusted variant, which measures a player’s most valuable five-year window up through and including his age 28 season (or simply his next five years of performance if he’s already age 25 or older).
I realize that all of this is a bit complicated, and I encourage you to explore the
This definition is very important to keep in mind when we tell you, for example, that Dustin Pedroia is at least as valuable as Delmon Young. Young is two years younger than Pedroia, and has a more athletic profile that will likely age better into his thirties. He is probably at least a 2:1 favorite to produce more value over the course of his entire major league career. But he is not such a favorite to produce more value in the years during which he’s under club control. You can only get six years and change out of a prospect before his arbitration clock runs out, and they won’t necessarily be his best years, especially if he reaches the major leagues very young. That Delmon Young is likely to be more valuable than Dustin Pedroia when he’s 32 doesn’t matter one whit to the Red Sox or Devil Rays. Both players will be long gone from their parent systems by then, or will have had the chance to negotiate deals at market price. This is very important to understand, and it’s a point that I didn’t emphasize enough last year. I do happen to think that it’s the “right” way to value prospects, but we’ll save that discussion for another time.
As we’re using peak-adjusted Upside as our sole ranking criteria, we’ve made a couple of tweaks to increase its viability. Firstly, for position players, Upside now accounts for defensive as well as offensive performance. I realize that some of you won’t be thrilled with using minor league fielding numbers, but there are a couple of things to keep in mind:
Most importantly, If we don’t account for fielding, then we run into problems with players whose defense is grossly inadequate for their positions, and will almost certainly wind up playing elsewhere if and when they hit the majors.
The second tweak is that we fudge a bit on the definition of ‘average’ for pitchers, giving starting pitchers credit toward their Upside score beginning with ERAs slightly below league average, while requiring somewhat above-average performance from relief pitchers. The rationale for this stems from the research I did for the starter-reliever adjustment, which reveals that a typical pitcher can expect to have an ERA about 25% higher if he pitches as a starter instead of out of the bullpen. The difference between leaguewide ERAs for starters and relievers is somewhat less than 25%, but this is because of selection effects: since starters pitch more innings than relievers, they tend to be a team’s better pitchers. This adjustment gives back some of the credit to starting pitchers, which in turn produces rankings that are a bit more intuitive. Indeed, you’ll find a handful of pitchers rated by PECOTA as among the very best prospects in the game, which wasn’t the case last year.
Finally, if you’re having trouble interpreting the Upside scores, you can use the following as a guide, adapted from Kevin’s categories in his team-by-team rankings.
Players are eligible for the PECOTA prospect lists provided that they are considered a rookie by major league standards (no more than 130 career MLB at-bats, 50 career IP, or 45 days on an active 25-man roster), with two exceptions:
I also considered capping the age for prospects at 25, in order to weed out the
Players are listed at their position as determined by PECOTA, which is in turn based on the position at which they’ve accumulated the most playing time over the course of the past season or two. We will not try and guess at which prospects will switch positions, though of course the DT fielding ratings will provide some hints. Note that we are able to account for players who have split time between different positions. Dustin Pedroia is listed as a second baseman for the purposes of these rankings, since he’s accumulated a little more playing time at the keystone, but his valuation scores consider him more of a 2B/SS hybrid.
I hate to have all this buildup without any payoff, but it’s important to explain what we’re doing here. Besides, you won’t have to wait long. We'll start with the catcher rankings tomorrow.