BP Comment Quick Links
![]() | |
July 10, 2007 Prospectus ToolboxSmall Samples and All-Star BerthsWelcome to the latest edition of Prospectus Toolbox. We're back to conceptual topics this week-we're not going to talk about a specific statistic or report, but rather the factor that effects how statistics and performance are perceived. That factor is time, specifically playing time. The other day, I was talking with Joe Sheehan about a bit of research Marc Normandin did for his pinch-hit appearance on the Prospectus Hit List, which showed that the Angels' Chone Figgins had an outrageous 53 hits in June, one of the highest single-month totals we'd seen in years. Joe related that Figgins' hot June had caught him off-guard-the last time he'd looked up the Halos' versatile speedster, Figgins had been struggling to hit .130, and then just a few weeks later, he's reviewing the Angels' statistics before a radio appearance and there's Figgins, hitting .320 and climbing. This isn't unusual. Well, okay, a player hitting .461 for the month of June is unusual, but writing a guy off early and then being surprised by a hot streak isn't. We often get an image of a player's season based on a snapshot-often his stats when he comes to town to play the home team-and forget how quickly things can change. This works both ways: often a player's hot start will blind us to the fact that they've come back to Earth-or been buried under it. In April and May J.J. Hardy was hitting like Joe Hardy of Damn Yankees fame, bopping 15 homers in 234 plate appearances. In June and July, it's been a little more like Carroll Hardy, with only three homers and a .231/.314/.352 line in 121 plate appearances. Much of this has to do with the crucial concept of sample sizes. I know, it sounds like something they give you when you walk through the fragrances section of your local department store, but in baseball what we're talking about is making sure you have a large enough body of data to support an analysis. Any given piece of a player's track record is considered a "sample" of data, ranging from a single at-bat to an entire career's worth of hits, walks, and strikeouts. Game to game, players' performances fluctuate wildly. Pick a lousy batter-let's say Neifi Perez. Even all hopped up on Sudafed, Perez's career line is just .267/.297/.375, with a paltry .218 EqA. He has 64 total homers in over 5,000 career at-bats. Regardless, Neifi Perez has hit for the cycle, has a five-hit game on his resume, and has driven in four runs in a game almost a dozen times. If one of those games was all that you saw of Perez's career, you'd think he was a pretty fine ballplayer. If you only saw Perez in the finest month of his career-say, September 2004, when he hit roughly .371/.400/.548, or April 2005, when he batted .368, you might think he was an indispensable piece of your ballclub. But then you'd have to change your name to Dusty Baker, start making politically incorrect statements about white ballplayers being unable to handle bright sunlight. The point is, Neifi Perez isn't exceptional in this regard. Lots of bad players have big games, big months, big postseasons, sometimes even big half-seasons which could bias your evaluation of their skills. The way you discern the gold from the pyrite is to gather larger and larger samples of data-hundreds of innings pitched, preferably thousands of plate appearances-to examine. If the player doesn't have a major league track record, you look at his minor league stats. So long as you can account for the different context, minor league performance data can be almost as good as big league stats. One reason not to trust small samples is because they can change very quickly. In Figgins' case, he got a late start to the season due to injury, so when he was hitting .130 in late May, his season was only about 100 plate appearances old-less than half the playing time you'd expect a regular to have at that point in the season. When the sample is that small, a single game can make a huge difference-on June 9, Figgins had a four-hit game against the Cardinals, and saw his batting average jump from a pathetic .234 to a merely bad .254. A six-hit game against the Astros on June 18 raised his average from .258 to a downright respectable .284. Why bring up all this sample size talk now? It's because of tonight's festivities in San Francisco, baseball's annual celebration of the small sample. The Midsummer Classic celebrates outstanding half-seasons of performance, and always the first half of the season. This sounds better than it actually is, because with balloting for the All-Star game starting in May, many fans are making judgments about who the best players are based on far less than an entire half-season's worth of data. This is largely the result of how we keep track of baseball performance data-at the start of every season, each player starts with a blank slate-and the ones who fill that slate early in the season draw more attention than those who slowly build up their record through the course of the season. One way to expand the sample for the All-Star game would be to look at a player's performance not merely from the start of the year, but from the previous year's All-Star game. This would effectively double the amount of data we have on the players, and give some extra credit to players who finished the previous season well. Below we have a chart of the best players, by VORP, since the 2006 All-Star game. A few notes before we dig in:
AL Catcher PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Victor Martinez (R) 647 .323 .394 .505 .277 61.9 1,024 Jorge Posada (R) 574 .307 .382 .517 .249 55.2 1,643 Joe Mauer 531 .310 .406 .467 .222 46.3 1,465 Kenji Johjima 514 .294 .329 .449 .069 28.7 583 Jason Varitek 405 .271 .360 .439 .064 21.4 1,444 Ivan Rodriguez (S) 563 .289 .314 .436 .009 20.7 2,343 As we can see, the All-Star starter was just about the worst choice available, with an awful .314 OBP over the last calendar year. The myth of Japanese ballot-stuffing seems to have hit a speed bump, considering Kenji Johjima's low vote total.
AL First Base PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Justin Morneau (R) 685 .316 .380 .552 .303 48.4 1,780 Mark Teixeira 592 .296 .399 .581 .338 47.6 459 Paul Konerko 626 .285 .367 .516 .188 29.7 N/A Richie Sexson 614 .263 .350 .514 .167 25.5 392 Carlos Pena 310 .286 .392 .587 .329 25.4 N/A Kevin Millar 498 .289 .394 .474 .195 25.2 N/A Lyle Overbay 519 .299 .360 .498 .170 22.0 N/A Kevin Youkilis 638 .295 .386 .446 .136 20.6 N/A Because designated hitters are considered first basemen in this year's All-Star vote, Justin Morneau is relegated to reserve status. He has the distinction of actually being the best glove-using first baseman in the American League.
AL Second Base PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Brian Roberts (R) 705 .300 .372 .440 .121 49.4 529 Robinson Cano 567 .308 .339 .501 .162 36.0 1,332 Placido Polanco (S) 512 .330 .371 .416 .114 28.2 2,318 Mark Ellis 613 .271 .340 .425 .025 22.9 N/A Esteban German 438 .295 .390 .437 .131 22.6 N/A Luis Castillo 630 .311 .365 .364 .009 22.0 873 Ian Kinsler 613 .255 .332 .425 -.017 21.2 N/A Ty Wigginton 512 .285 .340 .485 .113 20.2 N/A Both of the AL All-Stars at the keystone lose a little bit when their 2006 second-half performance is taken into consideration-Polanco because he missed a lot of playing time after the All-Star break, Roberts because his second half was sub-par. The main thing that expanding the sample does is get Robinson Cano-who has been inconsistent in 2007, but ended the season like a house on fire in 2006-back into the conversation. Nothing here explains how Polanco got more than four times as many votes as his backup.
AL Shortstop PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Carlos Guillen (R) 593 .336 .408 .559 .380 71.2 1,718 Derek Jeter (S) 729 .340 .408 .484 .279 70.9 3,200 Miguel Tejada 623 .328 .381 .453 .184 48.1 840 Orlando Cabrera 684 .303 .343 .419 .054 36.5 850 Michael Young (R) 728 .299 .345 .424 .038 31.4 469 Injuries to Miguel Tejada and Mark Teixeira left the door open for Michael Young to tag along as the Token Texas Ranger (TTR). Mixing in some of his 2006 performance makes Young seem a little less out of place on the All-Star roster.
AL Third Base PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Alex Rodriguez (S) 672 .309 .405 .608 .411 76.5 3,891 Adrian Beltre 613 .280 .333 .520 .165 33.7 528 Troy Glaus 520 .271 .379 .503 .182 30.0 N/A Casey Blake 584 .271 .347 .466 .098 21.6 389 Mike Lowell (R) 617 .281 .335 .477 .083 21.1 1,364 Predictably, Rodriguez leads the AL in homers (46), runs scored (131), RBI (142), and slugging percentage (.612) since last year's Midsummer Classic. Like Troy Glaus, Mike Lowell started off the season well, but where Glaus slumped in May, Lowell fell into a far more profound slump in June.
AL Left Field PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Manny Ramirez (S) 553 .307 .409 .522 .285 42.4 2,153 Carl Crawford (R) 663 .286 .336 .440 .046 26.2 602 Juan Rivera 286 .335 .381 .538 .320 26.1 N/A Raul Ibanez 626 .285 .340 .458 .093 20.7 N/A Left field in the American League is, to put it bluntly, thin, and made thinner by the injury Juan Rivera sustained in winter ball. Ramirez is the sixth-best outfielder the AL has over the past year, and Carl Crawford is the token Devil Ray and AL stolen base king (49 since the 2009 All-Star game).
AL Center Field PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Grady Sizemore (R) 748 .286 .393 .512 .248 69.4 1,297 Ichiro Suzuki (S) 727 .329 .376 .419 .138 50.8 2,341 Torii Hunter (R) 601 .301 .336 .559 .230 49.5 1,717 Kenny Lofton 563 .300 .374 .419 .083 33.6 N/A Gary Matthews Jr. 680 .287 .353 .442 .078 31.6 511 Curtis Granderson 661 .263 .317 .487 .062 31.3 N/A
AL Right Field PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Vladimir Guerrero (S) 665 .345 .424 .579 .451 79.9 3,151 Magglio Ordonez (S) 658 .328 .397 .526 .310 58.3 2,715 Nick Markakis 672 .292 .350 .475 .128 34.3 N/A Michael Cuddyer 698 .285 .361 .472 .132 33.9 621 Mark Teahen 590 .296 .374 .473 .153 33.4 N/A J.D. Drew 572 .270 .390 .459 .141 26.6 951 Alex Rios (S) 580 .283 .333 .485 .091 25.6 316 What the AL lacks in left field over the last calendar year they more than make up for in center fielders and right fielders. Even better, all the right ones have been invited to San Francisco.
AL Designated Hitter PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- David Ortiz (S) 646 .309 .440 .612 .453 77.5 2,858 Travis Hafner 572 .270 .396 .532 .267 43.1 806 Jim Thome 499 .271 .425 .522 .274 41.7 N/A Frank Thomas 635 .275 .386 .502 .198 40.5 N/A Gary Sheffield 396 .293 .396 .538 .287 33.0 1,739 Hafner and Ortiz were on the first baseman's ballot, Sheffield on the outfield ballot, all based on the stupid rule that you don't get a DH when the game's in a National League ballpark. Whether as an first baseman, over-glorified pinch-hitter, or dog catcher, Ortiz had to be in this game. He's the league-leader in OBP (.439) in second place in homers (37) and slugging percentage (.609) over the past year.
NL Catcher PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Brian McCann (R) 540 .289 .340 .524 .176 43.2 972 Russell Martin (S) 603 .291 .362 .456 .115 41.4 2,039 Paul Lo Duca 548 .304 .343 .408 .036 26.1 1,742 Bengie Molina 504 .282 .304 .462 .012 20.7 1,383 Chris Coste 189 .337 .386 .543 .315 20.1 N/A The two best catchers are going to the game. Lo Duca's high vote total could in part be attributable to his red-hot performance in May (.393/.441/.488), which was followed by an equally impressive collapse (.214/.250/.342 since June 1).
NL First Base PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Ryan Howard 657 .307 .449 .658 .520 76.1 851 Albert Pujols (R) 689 .326 .418 .577 .398 64.7 1,936 Lance Berkman 653 .285 .410 .533 .280 44.4 N/A Prince Fielder (S) 660 .282 .379 .565 .274 43.1 2,706 Adrian Gonzalez 684 .297 .370 .506 .223 36.3 532 Todd Helton 696 .309 .415 .462 .191 30.7 N/A Dmitri Young (R) 429 .326 .373 .510 .256 30.0 N/A Adam LaRoche 609 .274 .350 .532 .181 26.1 N/A Derrek Lee (R) 433 .325 .400 .483 .230 24.1 988 Howard's just the NL leader in homers (51), RBI (145), and slugging percentage (.655) since last year's game-no need for him to be an All-Star. Meanwhile, Dmitri Young is having a nice half-season, but that shouldn't make anyone forget that last year, he was cut by a team in the middle of a playoff run. The fact that he makes this team while Howard is on the sidelines is direct evidence that the home-field advantage conferred by winning this exhibition game is a farce.
NL Second Base PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Chase Utley (S) 744 .314 .392 .552 .308 74.2 2,214 Orlando Hudson (R) 689 .304 .383 .473 .163 46.5 N/A Ray Durham 590 .287 .351 .496 .139 38.0 672 Jeff Kent 550 .293 .375 .468 .154 33.0 1,206 Dan Uggla 746 .252 .317 .473 .047 29.5 532 Also a reserve, but not making the 20 eVORP minimum was token Pirate Freddy Sanchez (.311/.342/.402 since last year's game). But that's not really an outrage until you consider the next list…
NL Shortstop PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Hanley Ramirez 718 .325 .379 .548 .327 82.4 456 Jose Reyes (S) 702 .303 .371 .463 .161 60.0 2,214 Jimmy Rollins 772 .289 .334 .527 .159 59.4 834 Rafael Furcal 697 .303 .367 .453 .129 46.9 1,109 Edgar Renteria 673 .296 .356 .449 .102 41.9 847 Hanley Ramirez, the best player over the past calendar year, doesn't make it to the All-Star game, basically because of J.J. Hardy's early-season home run binge. Hardy missed the second half of last season with an injury, and his June/July slump has him below the 20 eVORP threshold this season. Still, that was enough to vote him in on the Players' Ballot. Ramirez is third in runs scored (120) and stolen bases (51), and first in doubles (50) since last year's All-Star game.
NL Third Base PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Miguel Cabrera (R) 671 .333 .409 .573 .418 74.5 1,808 Chipper Jones 443 .334 .420 .653 .526 62.2 1,084 Aramis Ramirez 592 .319 .372 .604 .352 59.3 899 Garrett Atkins 694 .304 .382 .532 .226 48.3 David Wright (S) 649 .298 .374 .492 .201 48.3 2,303 No wrong choices in this group, but looking at the larger data set would probably change some minds about the fact that Aramis Ramirez belongs in this game. Over the past year, he has the third-most homers (37) and RBI (118) and the second-highest slugging percentage (.606). Of course, since the guy who came in first in all those categories isn't an All-Star either, there's no sense complaining.
NL Left Field PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Barry Bonds (S) 521 .296 .480 .597 .474 68.5 2,325 Matt Holliday (R) 689 .330 .392 .582 .336 60.7 1,549 Alfonso Soriano (R) 693 .298 .360 .552 .252 51.9 2,203 Chris Duncan 524 .289 .374 .572 .287 45.1 555 Carlos Lee (R) 700 .303 .354 .513 .182 38.4 1,166
NL Center Field PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Carlos Beltran (S) 643 .266 .359 .514 .174 49.7 2,511 Mike Cameron 672 .269 .342 .482 .118 38.2 Bill Hall 609 .274 .359 .502 .150 36.9 1,177 Aaron Rowand (R) 518 .295 .368 .457 .119 30.2 696 Hunter Pence 281 .341 .367 .581 .340 27.8
NL Right Field PA AVG OBP SLG MLVr eVORP Votes --------------------- ---- ----- ----- ----- ------ ------ ------- Ken Griffey Jr. (S) 549 .273 .368 .518 .175 37.5 2,987 Luke Scott 497 .281 .374 .535 .223 34.1 Brad Hawpe 567 .290 .381 .505 .175 31.3 Corey Hart 477 .289 .347 .506 .151 28.0 1,099 Shane Victorino 670 .283 .348 .413 .017 24.7 If, looking at the NL outfielders you started humming "one of these things is not like the others," Aaron Rowand would probably be the guy that starts blushing. He was having a really poor second half of the season last year, hitting .257/.327/.407 prior to suffering another season-ending injury after an outfield collision in August. So, at least his All-Star story has a feel-good, "welcome back from your gruesome injury" angle. In conclusion, it's not that there's anything magical about the period between All-Star games that makes it an ideal data set for looking at the performance of your prospective All-Stars. You could go back eighteen months or further in looking at the performance of the players you're going to designate as the league's elite. The thing to remember is that you have a choice aside from just accepting the half-season (or less) sample that's available to you in making the decision. Finally, by way of thank yous, I couldn't have even started to put together this article without the help of BP's technical staff, who generated the eVORP and rankings for me, as well as invaluable time-twisting resources such as Dave Pinto's Day-by-Day Database, Sean Forman's wonderful and constantly improving Baseball Reference site, and, of course, Click here to see Derek's other articles. You can contact Derek by clicking here
|