BP Comment Quick Links
![]() | |
February 7, 2011 Reintroducing PECOTAThey're Here!by Colin Wyers And here we are–the release of the 2011 PECOTAs. While I have your attention, I’d like to say a few words about the production of the PECOTAs this year. I guess I don’t have to tell you that I’m filling some pretty big shoes here–Nate Silver is probably the most famous sabermetrician not named Bill James, and PECOTA is where Nate made his biggest mark in our community. And so I’m building on his work–and work by people like Clay Davenport and Gary Huckabay, too. I am, as they say, standing on the shoulders of giants. (In big shoes, apparently–this is what I get for mixing my metaphors.) So I owe them, and others I’ve probably neglected to mention, my deepest thanks. But even with all that, I wouldn’t have gotten this far without a lot of help. I couldn’t have accomplished what I’ve done without the help of everyone here at Baseball Prospectus, who have really given me all the support I could ask for. But I want to thank a few people especially–it’s a fine line to walk, as if I list too few I risk upsetting someone unfairly left off, and if I list too many, you won’t finish reading the list. So with that in mind–very special thanks to Rob McQuown, Mike Fast, Ben Lindbergh, and Steven Goldman. Gentlemen, take a bow, and everyone, please, give them a round of applause. Now, then, on to business. This is the first release of PECOTA, and as such will continue to undergo revisions through the remainder of the offseason. The program we use to generate the PECOTAs is continually evolving, and when we discover new ways to improve the forecasts, we'll make those changes and pass the updated forecasts on to you. We’ll also be updating periodically to keep up with players who switch teams. In addition, we have several PECOTA features yet to roll out out–first will be the Depth Charts, which combine the PECOTA forecasts with estimates of a player’s role and playing time. Those should be ready a week from now, and will be available in two forms: the team Depth Chart pages, and the Player Forecast Manager (which is receiving some upgrades as well). After that, we’ll be publishing the PECOTA cards, featuring perks like the percentiles and the ten-year forecasts. We’ll update you more as we get closer to that point. We’re also revamping the cards to use the new Wins Above Replacement Player model we’ve developed. PECOTA has already been adapted to use the new WARP, so WARP baselines have shifted a bit from what you’re used to seeing. The biggest change is among relief pitchers, who take a major hit. Please keep this in mind as you review these forecasts. We know that many of you are relying on these forecasts for your fantasy teams, and we thought that it was better to get the forecasts out now rather than wait for when the entire site was ready to transition to new WARP. Now, some of you may be asking, “How good are the PECOTAs this year?” Of course, we won’t know the answer for another eight months or so. But we can come up with an educated guess, if we make the assumption that there’s nothing special about predicting the 2011 season, and that a system that works over previous seasons will work in succeeding seasons. In the course of producing the PECOTAs, we generate forecasts for every player who played from 1950 through 2010. These aren’t quite the same as the full PECOTAs–they are park-neutral, rather than being adjusted for the home park a player plays in. They are not age-adjusted. And for the most part, they do not reflect minor-league performance–we have major-league data for all of MLB history, but very little minor-league data. Still, they do represent a substantial portion of the PECOTA process. It is time-prohibitive for us to generate full age curves for all of these historic forecasts, but we did adapt a simplified set of age adjustments for past purposes. These simplified PECOTA forecasts aren’t as accurate as the full PECOTAs, but they give us a chance to view how well PECOTA fares over a large swath of history. There’s only one other projection system available and therefore able to be pitted against PECOTA for such a large part of baseball history: the Marcels, originally developed by Tom Tango. (The version we’re using in these tests was published by Jeff Sackmann.) I “re-baselined” each forecast for each season in the test by subtracting the average forecast and adding in the average performance of the players forecasted (weighted by playing time, in both instances). Then I took a look at two tests–one is root mean square error, which tells us that 68% of forecasts were within that margin of error. The other is simply counting which forecast was closer to a player’s actual performance. Looking at offensive stats first:
In terms of RMSE, our simplified PECOTA is in a dead heat with the Marcels. (And again, PECOTA is giving Marcels an edge, as it is forecasting everyone for a neutral park; the Marcels make no park adjustments, but most players do not switch teams or parks between seasons.) In terms of “success rate” (in other words, the percentage of head-to-head projection matchups "won"), PECOTA has a slight edge. Now, for pitching:
Again, RMSE shows a dead heat. In terms of success rate, the Marcels have a slight edge over the simplified PECOTAs. Like I said, this sort of testing emphasizes breadth rather than depth–PECOTA has foregone several of its advantages, like park adjustments and minor-league data. And yet it’s still producing accurate forecasts. Now, just as a reminder–PECOTAs are available only to our subscribers (Premium and Fantasy) who are signed up for a whole year. If you haven’t already subscribed, you can do so here. Doing so doesn’t just get you access to PECOTA, but also to our fantasy tools like the Player Forecast Manager and Team Tracker, as well as exclusive access to some of the best baseball writing out there. If you’re already a subscriber, thank you for your support, and I truly hope that you all enjoy what we do as much as we enjoy doing it. I am continually amazed at how intelligent and knowledgeable our readers are, and I really do think that the people who read BP are among the best baseball fans I’ve ever known. That’s all I've got–have fun, folks.
Colin Wyers is an author of Baseball Prospectus. Follow @cwyers
BP Comment Quick Links Chomsky (103) Other readers have rated this comment below the viewing threshold. Click here to view anyway. I see the PFM isn't quite ready yet - ETA? Benjamin Harris (44774) "In addition, we have several PECOTA features yet to roll out out–first will be the Depth Charts, which combine the PECOTA forecasts with estimates of a player’s role and playing time. Those should be ready a week from now, and will be available in two forms: the team Depth Chart pages, and the Player Forecast Manager (which is receiving some upgrades as well)." NYYanks826 (37443) Looks like, according to PECOTA: Keep in mind that these are weighted means - in real life, there are inevitably going to be outliers who do better because they exceed their forecasts. So saying "PECOTA thinks only one player is going to hit 40 homers" is a bit like saying "PECOTA thinks everyone is only going to roll sevens at the craps table." Feb 07, 2011 06:06 AM deacon14 (26198) Could someone better explain Weighted Means to me? What exactly is the spreadsheet we are looking at? One, will playing time affect Jason Heyward's home run number or only a variance away from the mean. Two, what are Mike Trout's projections supposed to represent, if he played in the majors this year? Tony (9580) From my PECOTA experience: We just put this up! The world is asleep and you're here! I love you guys! Feb 07, 2011 02:15 AM Tipman (33229) Just looked at the Mets team quickly, and Kelvim Escobar with a 3 WARP, which is 2nd highest on Mets pitching staff? I know the Mets pitching staff is bad, but didn't think a guy who is going to pitch in the majors would be 2nd best! R.A.Wagman (32721) Question about playing time estimates. As this release is not adjusted for playing time, I'll have to take the over on Brandon Morrow, among others. Will the post-depth chart release include such adjustments? smallflowers (38782) YEAHHHHH!!! This is awesome. jpjazzman (32699) Also please add positions for sorting, really a pain without that. Here's how you can do this yourself in Excel. Feb 07, 2011 06:24 AM norraist (49922) Minor quibble -- seems that the find string for "C" will return a hit for both C and CF. This is because of the same reason using "F" can return any OF. billminick (40945) How exactly do I access the PECOTA projections? This is not user-friendly at all. Please advise. Thanks. NYYanks826 (37443) Once you open the Excel file, you should see tabs at the bottom for batters and pitchers. jpjazzman (32699) I think he means where the physical file is (which is a bit hidden) Hoff (37596) My goodness, thats quite the gap between pujols and the rest of the hitters. Maybe he should offer the rest of them lessons. Bob (24776) Awesome. But where is "upside?" Believe it or not, THIS is the reason I subscribe to Baseball Prospectus. mwright (17346) Echo this sentiment. I'm sure you guys are swamped today but at some point would like to know if "upside" will be coming back or if there will be any other long-term value metric introduced in its place. pakdawgie (27451) I'm assuming that it will take a couple of weeks - they need the long term projections (i.e. the info on the player cards) to calculate this. Tony (9580) this was the first thing that I looked for, too! Upside requires the percentile forecasts, which typically aren't ready for the initial PECOTA release. As we get closer to the cards (and I'm not saying we'll wait until the cards are released) we'll get that added to the spreadsheet. Feb 07, 2011 15:13 PM jlefty (39531) Oh this is torture. The URL for the spreadsheet includes the words "baseball" and "fantasy" which means it's blocked here at the office--criminal, I know! leites (17240) The comps for Carlos Santana are David Wright, Alvin Davis and Carl Yastrzemski (!). Does this mean PECOTA does not think Santana will remain a catcher? leites (17240) Also amusing to see that one of the comps for Mitch Moreland is Adrian Gonzalez . . . leites (17240) And Gordon Beckham's #1 comp is Gary Sheffield. (Why does PECOTA seem to think Sheffield started out as a second basemen? A few years ago he was the top comp for Dustin Pedroia.) GoTribe06 (25189) He is also on Mike Trout's comp list. Anybody who is anybody has a Gary Sheffield comp. leites (17240) Hah! Other players this year who have Sheffield as a comp include Bobby Abreu, Kosuke Fukudome, Shin-Soo Choo, Magglio Ordonez (who also has Stan Musial as a comp), Jacob Smolinski (who also has Ron Santo), and Carlos Beltran. zstine1 (32591) also an AL or NL column, which has been in past spreadsheets is very useful for AL or NL only league fantasy owners. Brian DewBerry-Jones (244) I'm getting the error "The requested URL /fantasy/files/PECOTA_20110207.xls was not found on this server." almartin (34738) I'm also getting the 'requested URL' error. Please advise! almartin (34738) I'm a premium subscriber but am getting a 403 Forbidden error. Is this on my end or yours? doog7642 (3522) No breakout scores above 10% for hitters? Seems odd. acmcdowell (25609) Yes, it does seem odd, especially in light of the extremely high breakout values for some pitchers (20+ over 20%, with a mix of injury recoveries, job shifts, and young players). tbwhite (361) Pitchers are more unpredictable, so it makes sense that in general they would be more likely to break out or crash and burn. However, the lack of any hitter over 10% does feel low. I suppose the logical next question is what exactly does "breakout" mean, and has the definition changed at all from previous years ? doog7642 (3522) I have no math mind whatsoever, but this seems like a big deal. "Breakout" used to mean (if I'm not mistaken) the percentage chance that the player put up a statline that was 20% (I think) more than the baseline projection. If across the board, no hitter is having that happen more than 10% of the time, what does that mean? Does it mean the comps are consistently conservative? Does it mean that regression to the mean is eliminating potential outlier performance projections? I would like to understand this better. John Carter (22689) The link "have fun folks" isn't working on my computer. My server can't find that web page. slepican (36665) Anyone having luck accessing in the last hour? I am getting the 403 Forbidden error. geefsu (27636) I can't access the page, either. Is there still a server issue or do I need to look into this further? Everyone, the server should be fixed now. Sorry for the delay - let us know if you have any more problems. Feb 07, 2011 07:42 AM doog7642 (3522) Am I correctly understanding that once fielding is taken into account, PECOTA thinks Jim Edmonds is an outright improvement over Colby Rasmus in CF for the Cards? Cromulent (32088) Thanks, guys. Looking forward to diving in. One quick request: always enjoyed the companion article Nate wrote about what surprised him, what projections looked high or low, etc. I don't know if that's planned, but it would be welcomed, at least by me. ATLExile (57568) Not to pick individual projections apart, but Freddie Freeman as -17 FRAA at 1B, and below replacement level? Really? All the scouting I've heard rates his glove as above average, but PECOTA sees him as Adam Dunn. Mooser (26842) So 381.1 WARP for hitters and 620 WARP for pitchers. I know that playing time is not adjusted yet, but seems too heavily weigthed to pitchers. A leaderboard for WARP would show Pujols at #1, then 14 starting pitchers, and then Ryan Braun as the second best position player. doog7642 (3522) Perhaps there is a relationship between the low WARPs and the low breakout scores for hitters. Could it be that this PECOTA is particularly gunshy about suggesting significant steps forward for younger players? ATLExile (57568) It sees Jason Heyward as taking a significant step backwards. Buster Posey seems to be treading water in rate terms, but taking an obvious step forward in playing time. dREaDS Fan (51622) Down in the weeds, but Fred Lewis is on the Reds nowadays, not TOR. (I was curious how bad a LF the Reds will field this year between Lewis & Gomes.) douglasgoodman (26612) Anyplace in particular we should send names that seem like they should be in the spreadsheet but are not? Happened to notice Stolmy Pimentel was not there -- seemed like he was advanced enough to get a card... MattBey (54385) So, Victor Marte is Carlos Zambrano's #2 comp? Let's look at the differences between Victor Marte and Carlos Zambrano. MattBey (54385) Other readers have rated this comment below the viewing threshold. Click here to view anyway. and an aside, Archer has Nick Adenhart as a comparison, does that mean PECOTA thinks Archer has a higher chance of flaming out by age 23? Sad that I even have to ask this question. LynchMob (6915) I assume the basis for this comp can be explained by this statement from Colin's article last week ... MattBey (54385) There's nothing wrong with the comparison between Archer and Adenhart, but does PECOTA think Adenhart had a massive baseball related injury that caused him to be out of baseball by 23? If so, wouldn't it factor that into Archer's unfairly giving him a higher chance of flaming out? Adenhart is a special case as a comp, yeah. We present three comps as a rough guide to what the comps look like, but the actual PECOTA projections use a lot more players than that. So it probably has an impact on his MLB%, but not as much as you would think. Feb 07, 2011 16:00 PM rscharnell (32599) Has BP totally eliminated VORP for the WARP projections this season or will the VORP still show up on the player cards? Chad Supp (60079) fgreenagel2 (5026) dconner (2632) I'm a bit annoyed by the lack of VORP myself, as that's my number one metric. I guess I'll just have to start focusing on WARP and trying to make the mental adjustment myself. Luke in MN (42774) Defensively, Mauer's projected at -2 at catcher and McCann's a +2 (every catcher seems to be in that narrow range). Mauer's averaged about +5 FRAA a year over his career and McCann about -4. Have you changed the way you calculate this stat? They're both at similar spots in their career, so it seems odd me me. Richie (27368) Any way of letting PECOTA know that Morrow and CJ Wilson are now starters? Among other things, then getting revised ERA and WHIP estimates for them? belowm (2439) Also no longer with the Blue Jays: Miguel Olivo (SEA) and Mike Napoli (TEX). Not with the Mariners: Russell Branyan (FA) and Guillermo Quiroz (SD). PelotaDiSoldi (32282) Wow, PECOTA does not believe in Delmon Young's power. I think his projections were better 4 years ago. Joe D. (3692) I think Vazquez is precisely the kind of pitcher where you should give PECOTA a lot less say than for other pitchers. LynchMob (6915) How long before you think PECOTA will incorporate into its algorithm data from hitfx/pitchfx? So that, for example, it will "know" that Vasquez lost some mph last year ... I'm really looking forward to that day, and being part of making that happen. Feb 07, 2011 14:27 PM Luke in MN (42774) Are you not publishing a context-neutral measure for pitchers anymore? You have tAV for batters, but I assume the ERA and FRA numbers for pitchers include adjustments for park and league, etc. For my money getting the context-neutral numbers is a big deal. Mikedaddy (18097) So, Stephen Strasburg is going to throw more innings (122.6) than Mat Latos (104), Ian Kennedy (100.6) and Phil Hughes (121)? jetheinenkel (20773) Just noticed that too - PECOTA projects Mat Latos to pitch in only 104 IP with 20 GS, but in those 20 games, it projects him to be a top pitcher. Does this mean we should interpret PECOTA as suggesting he has a very high propensity for a season-ending or prolonged injury? All the comparables (Gallardo, Hughes, Elarton) had injuries early in their careers as well directly after a successful early-career season. smocon (45877) Dratted content filter at work!!!! barosey (38105) Can someone point me to a description of 'MLB_PCT'. knockoutking (46172) heh glad to see JP Howell is projected to throw 103 games, 121 IP lol McLovins (11297) The Sort feature that is set up for each column on the Hitters tab is a huge time saver. Please add that for the Pitchers tab as well. Particularly when the other columns start filtering in like Upside, and Vorp etc... Thanks. jetheinenkel (20773) If you're working in a recent version of MS Excel, you can set this feature up yourself by selecting the "Sort and Filter" button in the "Home" tab, then selecting "Filter." cjrhgarmon (22748) This article is actually kind of depressing. I understand that the PECOTA used for the comparisons is stripped-down, but, if I am reading this correctly, the stripped-down PECOTA is in a dead heat with Marcel with regard to accuracy. My understanding is that Marcel is the most basic projection system out there: basically an age-adjusted three-year average. If you do all of this work and are no more accurate than Marcel, well then what's the point? leites (17240) Last year Nate Silver wrote a piece in the NY Times making exactly your point: Shadetree42 (33584) The goal should be to improve accuracy. Period. Any smart user is well aware of variance. So no, the goal of PECOTA should not be to explain what variance is; it should be to improve forecast quality (average error). TADontAsk (2173) At some point, there is only so much accuracy you can attain. Should they continually work towards a better system? Absolutely. But unless you find a projection system that is 100% accurate - obviously not possible - then a report of the accompanying variance of said projections is just as important, if not more so, than the point projection itself. Today's release is just the beginning. The percentile breakdowns will be coming with the cards. Feb 07, 2011 11:49 AM Shadetree42 (33584) I, too, do not understand how/why Marcel is anywhere in this article. The goal is be accurate, not just "no worse than the worst projection system out there." If your model is not ready for release yet, then don't release it. People can get/generate Marcel projections easily enough on their own, if they need something for today. Colin is more of an expert than I am on this, but I can say that Marcel is far from the worst projection system out there. It is consistently among the best. It is one of the simplest, that is true, but simple does not mean bad. Feb 07, 2011 12:43 PM Shadetree42 (33584) Marcel most certainly is not "among the best." This has been proven over and over. CHONE has crushed Marcel over the years, and PECOTA used to until 2009. I emphatically disagree, unless by "crushed" you mean "performed marginally better". The best projection systems, by which I mean CHONE, ZiPS, PECOTA, have historically been in the same neighborhood as Marcel, and outperformance of Marcel by those systems on rate stats (e.g., OPS, ERA) has been small. Feb 07, 2011 13:03 PM Shadetree42 (33584) Do you have aggregate data on this? Care to publish a study on it? "Crushed" means statistically significantly different than. Chone, for example, has shown itself to be superior to Marcel. I mentioned a couple links to studies already, below. What I am saying is nothing new. You don't seem to be completely grasping what I am saying, though. I am not saying that Marcel is identical to PECOTA in every way, nor did Colin in his article. Feb 07, 2011 14:02 PM George (31373) Your WARP score is saying that the best player in baseball will be Albert Pujols. The next 13 are pitchers (including injury combacks Peavy, Santana and Strasburg)? I don't buy it. jdtk99 (38768) Colin, what's the new replacement level, ie How many wins for a team with 0 WARP? Thanks. Shadetree42 (33584) I don't get this (below). Are you saying you are currently using some algorithm that is different from the one you will use to make final forecasts? That would make no sense. Juris (1283) If past is prologue, don't expect major fundamental improvements in the algorithms between now and the start of the season. But as Colin has already indicated, as the season approaches and lineups are set, the park adjustments are instituted, and anomalies in the data are discovered and corrected, the later PECOTAs, including especially those used in the Depth Charts and the PFM, will reflect the latest information. (That's the way it has been done for years.) Brian Oakchunas (9790) I thought Ichiro was meant to have a new and improved forecast for this year. Looks more pessimistic than ever. brooklyn55 (26639) Small thing, but helps in readability - could you format to drop the leading zeros. ie .345 instead of 0.345 patwood0 (40065) I know a lot of studies have been done to evaluate PECOTA and other projection systems when it comes to OPS, but is there any information for the reliability of 5x5 fantasy stat projections? Is a SB projection more likely to be accurate than a batting average projection? How about WHIP vs. RBIs? I guess what I'm really looking for is the average historical delta for each stat category projection among players likely to be drafted in fantasy leagues. TheTimmer (53875) I've just bought a Fantasy subscription for the year, but it won't let me see the Pecota data... is it only for Premium subscribers? I thought I had access last year as a fantasy guy??? deacon14 (26198) Two questions on weighted mean. One, are these ballpark adjusted? In other words, would Dave Bush look different if he was on Texas instead of Milwaukee as shown. Two, is the best way to look at these until we have depth charts is that the rate stats won't change based on playing time assumptions? wyliecoyote (235) Including error totals (not just FRAA or whatever) would be immensely helpful for Scoresheet, and related formats. WilliamWilde (29503) The comps are one of my favorite parts of this data set. Seems that Frank Robinson is in a ton of comps.? (Braun, Posada, Hanley, Mike Stanton, deacon14 (26198) Are comps based on similar player at that age? In other words, could Stephen Strasburg comp to Dwight Gooden (hot young stud) but so could a 35 year old expected to post a 6-5 record with a 4.71 era? hessshaun (41493) Great question. It would be really cool if it compared to the exact season. Like Jamie Moyer, '89. Not a necessity, but it would be cool to get lost in baseball seasons. I know I would go from comp to comp to comp to comp with no real purpose or care really. Just interesting reading. deacon14 (26198) Do pitcher wins just look at the pitchers recent number of wins or does it consider that players stats with an average offense? With their actual offense? In other words, if Greinke puts up the numbers from the last couple of years, will his wins be higher in these projections (support from Braun and company) or use an average of his Royals days? CRP13 (46873) Trying to post this without sounding critical...because I admire what you are able to accomplish with PECOTA. Joe D. (3692) Hamilton's projection is in 515 PAs, which partially explains why they look a little low. Also seems pretty fair since he's averaged under 500 PAs the last four seasons, and is heading into his thirties. (Also of note: Hamilton's BABIP of .390(!!) in 2010...) Joe D. (3692) * Division failure on my part. * CRP13 (46873) I understand your argument, I just don't agree. His career stats indicate that given approximately 500 AB (not PA), he is a 30 home-run hitter. This also ignores the fact that he was good for 156 games just 2 seasons ago, so the injury angle is overplayed. Well, they really don't, actually. Hamilton's career HR/AB prorates out to 26 HR per 500 AB, or 24 prorated out over the 464 at bats listed in the PECOTA spreadsheet. Feb 07, 2011 15:45 PM CRP13 (46873) What I'd really like to see, and have no right to demand, is projections ONLY of major league stats, with projected playing time, etc. I know that's coming on the depth charts. It just looks like if I added all of those counting stats (MLB proj. only), they wouldn't even come close to the totals achieved by MLB overall last season. jberkon (28225) A very interesting article - and one that would be really helpful to address a lot of these comments - would be one dealing with the 10 or so players on which PECOTA and other projection systems (pick one: MARCEL, CAIRO, OLIVER, ZIPS, etc.) disagree. Perhaps the 10 on which PECOTA is more bullish and the 10 on which PECOTA is more bearish. And then try to explain why PECOTA differs. Basically, we want to understand why PECOTA believes that Longoria will experience a reasonably large drop-off at the age of 25 (one that is NOT being projected by the other systems), and why, say, Dan Johnson is expected to have such a good year. Is there some particular stat or physical attribute to which PECOTA attributes more weight than the other systems? The overall set of projected numbers match up pretty closely with 2010 levels of offense. That is down from previous seasons, of course, so that may be why the numbers for hitters look low across the board. Feb 08, 2011 07:32 AM tbwhite (361) Regression to which mean ? The MLB mean or the player's 3 year mean ? By regression to the mean, I was referring to regression toward the MLB mean. Everyone (or almost) is going to be projected to do worse than their career-best year. The human tendency is to assume that a career-best year defines a talent level, and we want to see PECOTA project them to repeat that. However, that's not the most likely outcome. Most likely, if a player did really well last year, he got a little lucky, and, conversely, if he did really poorly last year, he got a little unlucky. Feb 08, 2011 08:30 AM tbwhite (361) I understand the concept, or at least I think I do, but it seems to me that PECOTA should be focused on regression to the player's mean not the MLB mean. If a player has an established level of say a .280 TAv and suddenly posts a .310 one season, then I get it that he won't likely post another .310, he'll probably revert back to his previous level of performance. But if a guy cranks out .280 TAv seasons like clockwork, there is no reason to believe he is going to tend to revert to a .250 TAv simply because that is the MLB average. It seems to me that regressing to the MLB mean implies that no player is really better than another, and that good seasons are merely caused by luck. That's a premise which is obviously false. I agree there's a lot of confusion about what PECOTA does. Quite a bit of it has been explained at one time or another in one place or another, some of it online, some of it in print. It might be helpful to see if some of that could be centralized or indexed in one place. Feb 08, 2011 09:18 AM Juris (1283) Every player is subject to a regression effect, even the best and the worst. Keep in mind that regression toward the mean works in both directions -- "unluckily bad" performance is usually followed by improved performance the following year; "luckily good" performance is usually followed by worse performance the following year. tbwhite (361) I think that study is flawed. Not all guys who hit between .300 and .310 are the same. Some of them are really .300 hitters who are performing as expected, some are .250 hitters having a "career year". Perhaps a very few, are actually superstars having an "off year". Odds are the first group doesn't decline much the next year, and the last group might actually improve a bit, but the middle group is likely to show a huge drop off as they revert to their normal level of performance. tbwhite (361) I understand that it is regression to the mean, and not instantaneous reversion to the mean, but I do believe the implication is that all MLB players have the same talent level. CRP13 (46873) Side note: If you could post those numbers as an Astros SS for the next 3 years, you're already an improvement on whatever they've had. The link that Juris posted above your comment is an excellent one to begin learning about what regression to the mean actually is. Feb 08, 2011 10:19 AM tbwhite (361) I think my comment still stands. The .350 hitters are by and large hall of famers or near hall of famers. You would get a much better prediction of their batting average in year n+1 by using their career batting averages(most of which are probably around .300) than a league average batting average of .275. If in making forecasts you assume regression to the league mean rather than the player mean, you will overstate the likely decline. CRP13 (46873) This is along the lines I was thinking too. My other general questions include: Brian Cartwright (4519) You are confused about what regression is. Including more than one season, with the most recent ones weighted more, is the method of looking at the relevant portions of a player's career. Regression is when we lack information about a player (even when he has played a lot, Colin has done articles on other sites about this). For example, if you have a 20 year old shortstop in AA, you might regress his stats to what all other 20 year old shortstops in AA have done. Or if he's a fat first baseman who hits a lot of homers. How does that group age? CRP13 (46873) I understand that part, but if players with extensive stats aren't being regressed somehow, and it's only the ones with little information to go on, how does one explain odd projections like Longoria, Zimmerman, Morales, etc? Brian Cartwright (4519) Colin will have to answer any questions about specific players, or how Pecota works, as it his (adopted) baby, but for regression in general, you take the players historical data and add a fixed amount of average performance of the group he's determined to be a member of. Joe D. (3692) "This also ignores the fact that he was good for 156 games just 2 seasons ago, so the injury angle is overplayed." Brian Oakchunas (9790) The Longoria projection is maybe a little low but not radically different from what he's done. Stanton had a lot more homers than PECOTA is projecting him for and a higher translated batting average so it's not spectacularly optimistic about him either. jberkon (28225) The Longoria projection seems very odd. As CRP13 points out, his projection is well-off the three year average (in terms of the rate stats). PECOTA has as his comps Miguel Cabrera, David Wright, and Mark Teixera, all of whom progressed nicely as young players. And PECOTA gives Longoria a 55% chance of improving this year. So why are the numbers so far off? For Longoria, specifically, I agree that his numbers look low. His projected batting average seems about 20 points too low to me. "Seems to low to me" is not necessarily the same as "PECOTA made a mistake here". We'll look into it and see if there's anything wrong. Feb 08, 2011 07:33 AM belowm (2439) More misplaced players, this time on the Nats: Adam Kennedy (SEA), Willie Harris (NYM), Justin Maxwell (NYY). The more I look at this spreadsheet, the more I find. Scott Hairston is with NYM, not SDP, and Jerry Hairston signed with WAS. Jody Gerut is with SEA, not SDP. Pedro Feliz is with KCR, not STL. Felipe Lopez is with TBR, not BOS. Etc., etc. wizstan (35939) Other readers have rated this comment below the viewing threshold. Click here to view anyway. Andres Torres projection is epic fail... LynchMob (6915) What is it about Torres that makes you think the current algorithm does not apply to him? Is it something you think could/should be incorporated into the algorithm? wizstan (35939) I think that a large break in a career should trigger a "reboot" of data. After several years of a different shape of performance you should stop trying to fit two dissimilar career patterns together. smocon (45877) I wouldnt be shocked at all to see him drop from his 6 WAR number or whatever it was to below 2. That was a HUGE overperformance. wizstan (35939) overperformance relative to what? Well, it doesn't seem to actually look that way. If you look at players who radically over perform their career stats, you actually don't gain any forecasting accuracy (in fact the opposite) when you throw out their older numbers. More recent numbers are of course more important in forecasts than older numbers, but not to the extent that you're implying here. Feb 07, 2011 15:02 PM wizstan (35939) Well if by overperfom their career stats you mean overperform what he did in the minors in 2006, well I think it is ludicrous to regress to that performance. I think dropping all weight given to minor league stats more than three years previous would significantly, and rationally, improve PECOTA, likewise dropping all major league numbers if there is a break of more than three years between MLB appearances would remove some dubious regressions. sho044 (10042) If the year was 2007 instead of 2011, I think this same comment could have been found on a Gary Matthews Jr post... that worked out well didnt it. wizstan (35939) Well, except for the fact that Matthews had a long CONTINUOUS history of one level of performance, and then one anomalous season. choms57 (46031) I have a premium subscription yet every time I try to download the Pecota spreadsheet it asks me for a username and password, even though I'm logged in. I put my said username and password in and it does not work. Can someone help me?? choms57 (46031) This sucks, I've been a customer for two years and now I can't see the main thing I look forward to! BarryR (1188) I know I'm cherry-picking here, but since Jhoulys Chacin is my cherry in a couple of leagues, I just have to pick him. matteson72 (34825) Here are your top 10 NL pitchers with the highest breakout rates: igjarjuk (1588) "Breakout Rate is the percent chance that a hitter's EqR/27 or a pitcher's EqERA will improve by at least 20% relative to the weighted average of his EqR/27 in his three previous seasons of performance. High breakout rates are indicative of upside risk." fairacres (1980) Am i missing something or are the individual players' batting averages off a bit? Pujols is forecast at 175 hits in 558 at bats -- .3136 on my calculator. PECOTA has him at "0.312". I spot checked a couple other players and found similar "errors." stlpdx (47802) This feels dangerously similar to the six weeks of 2010 pre-season PECOTA discussions. (Spoiler: it didn't end well). Jivas (649) A couple of thoughts: MattBey (54385) (1) Not really. Even if "improve" meant that they performed better than they did the year before (it doesn't), we wouldn't expect this to be 50 percent by construction. For one reason, look at all the minor leaguer's you're considering. A lot of filler players are going to move up a level and they're going to get weeded out. That alone isn't enough to lower it to something less than 25%, but it's a factor, you have to think. jrmayne (1468) There are some pretty severe problems with the comps for minor league players. yadenr (36923) I noticed as well. One of the first things I do is search for current players with my favorite past players as comps. I was fairly surprised to find that Eric Davis only shows up for Dennis Raben, who also has Willie McCovey(!) and Brandon Wood. Time to get that Raben jersey. tbwhite (361) This brings up an interesting question, I hope BP can address it, although I understand if they can't because it would reveal too much IP. tbwhite (361) Upon further review Decker has a ~.950 OPS thru age 20 in the minors. Mays had a 1.017 OPS in the minors thru age 20. Maybe that's what PECOTA is picking up. It does seem like perhaps batting is over-weighted compared to fielding for minor leaguers. I mean a guy who has to play LF in A ball is very different from the greatest CF of all-time. Juris (1283) @tb: If memory serves me correctly, the comps are always based on the player's age-cohort, so, for example, a given player's age 28 season's comps are always other players in THEIR age 28 seasons (not their prior or later careers). My inference about this is based on my understanding of how the projections are made -- by looking at the performance of the matching age-cohort of players in the database of all player-seasons from 1950 onward (adjusted in various ways). Juris (1283) I should add that the comps are thus the "most comparable" of the larger subset of players from the same age cohort (in the 1950-2010 player database) who are most closely matched on a set of criteria includes information not just on performance (stat lines) but also such characteristics as position, handedness, and physical type (height and weight). Juris (1283) So the "match" is determined by age, physical characteristics, position PLUS the (adjusted) "baseline performance" from the immediately previous 3 seasons of the given player and other players in the database. That is my understanding of the core method. How is the proximity between a player and his comparables actually calculated? I don't know the formula but the basic method is one of "nearest neighbor analysis." (See "nearest neighbor search" and "nearest neighbor analysis" in Wikipedia. Nate Silver uses a similar approach in some of his election forecasting, to take advantage of information from "neighboring" (most similar) states to help make election forecasts of particular states.) igjarjuk (1588) I'm not convinced by all this name dropping that there are "severe problems" with the comps. I think you should offer up evidence on PECOTA's terms: what information did PECOTA use to determine the comps? For example, with the Decker-Mays comp, I think you're calling foul because of your knowledge of the extraordinary career that Mays had, but I'm pretty sure that's not how PECOTA is designed to work: I don't think it's using Mays age 22+ seasons as the driver of the comparable match. Instead, Mays earliest years are used to make the match. In this case Mays later, very successful years are just a piece of a larger puzzle, one that in this case happens to contribute to a rosier outlook for Decker than if Mays were not a comp. This does not imply that a Mays like year is an assured, or even likely, outcome. See the Comparable Player and Comparable Year entries in the glossary for more details. tbwhite (361) Mays' great career is what puts a target on this particular comp, but a LF with an OPS of .950 in the Calif League is not very comparable to a CF with a 1.017 OPS in a month in AAA followed by an .830 OPS in the majors in ~500 PA and oh yeah the RoY award. tbwhite (361) Joey Votto is an interesting case. I would like someone at BP to please explain the following to me: jimnabby (9296) A lot of the questions in this thread would be cleared up if everyone just ran over to the glossary for a few minutes. RedsManRick (23592) Exactly, Vottos' Improvement rate is so high because he went from solid to very good to MVP over the last 3 years. So even slipping back to the very good category leaves him higher than his 3 year average. Richard Bergstrom (36532) Out of curiosity, how much did Matt Wieters' rookie season projection change using this version of PECOTA? cmac314 (17866) Of the 1012 batters listed, just 29 have an Improve % of 50% or greater. choms57 (46031) Just want to personally thank Colin, Ken, and Rob for emailing me and helping me figure out my pecota spreadsheet issue. Bp is the hands down best site ever. TADontAsk (2173) I completely understand the aged-related comps, and that when a 20-year old prospect has a comp of Willie Mays, they're saying that he's most comparable to a 20-year old Mays up to that point of Mays' career. NOT that he's going to have a Mays type career. tbwhite (361) There are 16 out of 1012 batters with Yount as a comp, 7 with Willie Mays. BarryR (1188) It doesn't matter what position Mays was playing in 1951. Mays, three months younger than Decker, was playing in the major leagues at age 20 after hitting .477 with a 1.323 OPS in AAA, while Decker was putting up a .950 OPS in the California League. These are not comparable hitters. jrmayne (1468) That's not it. I've been looking at these comp lists for years with some care, and I'd bet Toyotas to Tonkas there are more HOF or active tremendous players in the comps. Four Mantle comps: TADontAsk (2173) That's the thing. You'd expect this if it was listing the top 20 comps, or maybe even top 10. The confusion is that these are the top THREE comps. DLegler21 (23472) While I agree that it is troubling that great players are showing up in the top 3 comps so often, I think you are taking the term comp too literally. jrmayne (1468) I understand this. But the comps are supposed to be comparable players under the theory that comparable players will age comparably. Uncomparable players will not age comparably. If you think a guy mashing some in High Desert at the age of 22 is going to follow the same route as a guy mashing in the bigs at age 22, I'd like to see some authority for that. norraist (49922) I was just wondering why players who are going to either miss full or hals seasons (Strasburg and Johan Santana spring to mind) are listed as pitching so many innings? With the depth charts will come human input about likely playing time. The weighted means spreadsheet is the output that PECOTA gives without knowledge about who has already suffered an injury that will cause them to miss time next year or whose playing time may change because of change in role. Feb 08, 2011 08:43 AM Because these are the weighted means, not adjusted for playing time. Playing time projections are coming in a week, with the depth charts. Feb 08, 2011 08:54 AM Hokieball (33221) So is this the year that there will FINALLY be a uniform player ID that matches everyone between PECOTA, the PFM downloads, and the customizable in-season stat reports? IT would be awfully handy :) We'll be including IDs in the PFM output, and they will the the same ones we're including with PECOTA right now. I'll look into adding player IDs into the sortables. Feb 08, 2011 08:55 AM jlebeck66 (38645) This may be out of the scope of Colin's power, but player ID's for historical stat reports and for the minor league translations would be spiffy too. Do not question the Colin's powers, or you risk being ground up and fed into his nutrient bath. Feb 08, 2011 11:22 AM Richard Bergstrom (36532) If he was so powerful, why would he need a nutrient bath? BurrRutledge (18981) Courtesy of wiki: In the original Doom Patrol series, The Brain was regularly portrayed as a disembodied brain, bobbing inside a sealed dome filled with a nutrient bath, hooked up with numerous machines, including a loudspeaker to convey his voice. worldtour (38677) Can someone tell me why PECOTA loves Winston Abreu so much? Am I missing some Tommy John medical news, or am I reading the spreadsheet wrong?? BindleStiff (25170) I'm not sure if this has been noted already but for those who have issues accessing the spreadsheet using their username/password, please note that the username/password are case-sensitive when accessing the PECOTA spreadsheet. Brian DewBerry-Jones (244) A problem that I see with the comps is that everyone of the comps is a major league player. In the past, if you looked at the comps for a scrub in A ball, it was mainly other scrubs in A ball. What changed? Drew Miller (22526) Is the Adrian Gonzalez projection adjusted for Petco and not Fenway? .880 OPS seems awfully low for Boston. Drew Miller (22526) Also, there are a LOT of guys who are on wrong teams. It's as though the offseason of trades never happened. rscharnell (32599) Not a subscriber? Sign up today!
|
+1 with a ridiculously large number of zeroes following it.