<< Previous Article

Future Shock: Risers a... (07/10)

<< Previous Column
Prospectus Toolbox: Me... (07/03)

Next Column >>
Prospectus Toolbox: Ev... (07/17)

Next Article >>

Prospectus Matchups: T... (07/10)

July 10, 2007

Prospectus Toolbox

Small Samples and All-Star Berths

by Derek Jacques

Printer-friendly

Contact Author

Welcome to the latest edition of Prospectus Toolbox. We're back to conceptual topics this week-we're not going to talk about a specific statistic or report, but rather the factor that effects how statistics and performance are perceived. That factor is time, specifically playing time.

The other day, I was talking with Joe Sheehan about a bit of research Marc Normandin did for his pinch-hit appearance on the Prospectus Hit List, which showed that the Angels' Chone Figgins had an outrageous 53 hits in June, one of the highest single-month totals we'd seen in years. Joe related that Figgins' hot June had caught him off-guard-the last time he'd looked up the Halos' versatile speedster, Figgins had been struggling to hit .130, and then just a few weeks later, he's reviewing the Angels' statistics before a radio appearance and there's Figgins, hitting .320 and climbing.

This isn't unusual. Well, okay, a player hitting .461 for the month of June is unusual, but writing a guy off early and then being surprised by a hot streak isn't. We often get an image of a player's season based on a snapshot-often his stats when he comes to town to play the home team-and forget how quickly things can change. This works both ways: often a player's hot start will blind us to the fact that they've come back to Earth-or been buried under it. In April and May J.J. Hardy was hitting like Joe Hardy of Damn Yankees fame, bopping 15 homers in 234 plate appearances. In June and July, it's been a little more like Carroll Hardy, with only three homers and a .231/.314/.352 line in 121 plate appearances.

Much of this has to do with the crucial concept of sample sizes. I know, it sounds like something they give you when you walk through the fragrances section of your local department store, but in baseball what we're talking about is making sure you have a large enough body of data to support an analysis. Any given piece of a player's track record is considered a "sample" of data, ranging from a single at-bat to an entire career's worth of hits, walks, and strikeouts.

Game to game, players' performances fluctuate wildly. Pick a lousy batter-let's say Neifi Perez. Even all hopped up on Sudafed, Perez's career line is just .267/.297/.375, with a paltry .218 EqA. He has 64 total homers in over 5,000 career at-bats. Regardless, Neifi Perez has hit for the cycle, has a five-hit game on his resume, and has driven in four runs in a game almost a dozen times.

If one of those games was all that you saw of Perez's career, you'd think he was a pretty fine ballplayer. If you only saw Perez in the finest month of his career-say, September 2004, when he hit roughly .371/.400/.548, or April 2005, when he batted .368, you might think he was an indispensable piece of your ballclub. But then you'd have to change your name to Dusty Baker, start making politically incorrect statements about white ballplayers being unable to handle bright sunlight.

The point is, Neifi Perez isn't exceptional in this regard. Lots of bad players have big games, big months, big postseasons, sometimes even big half-seasons which could bias your evaluation of their skills. The way you discern the gold from the pyrite is to gather larger and larger samples of data-hundreds of innings pitched, preferably thousands of plate appearances-to examine. If the player doesn't have a major league track record, you look at his minor league stats. So long as you can account for the different context, minor league performance data can be almost as good as big league stats.

One reason not to trust small samples is because they can change very quickly. In Figgins' case, he got a late start to the season due to injury, so when he was hitting .130 in late May, his season was only about 100 plate appearances old-less than half the playing time you'd expect a regular to have at that point in the season. When the sample is that small, a single game can make a huge difference-on June 9, Figgins had a four-hit game against the Cardinals, and saw his batting average jump from a pathetic .234 to a merely bad .254. A six-hit game against the Astros on June 18 raised his average from .258 to a downright respectable .284.

Why bring up all this sample size talk now? It's because of tonight's festivities in San Francisco, baseball's annual celebration of the small sample. The Midsummer Classic celebrates outstanding half-seasons of performance, and always the first half of the season. This sounds better than it actually is, because with balloting for the All-Star game starting in May, many fans are making judgments about who the best players are based on far less than an entire half-season's worth of data. This is largely the result of how we keep track of baseball performance data-at the start of every season, each player starts with a blank slate-and the ones who fill that slate early in the season draw more attention than those who slowly build up their record through the course of the season.

One way to expand the sample for the All-Star game would be to look at a player's performance not merely from the start of the year, but from the previous year's All-Star game. This would effectively double the amount of data we have on the players, and give some extra credit to players who finished the previous season well. Below we have a chart of the best players, by VORP, since the 2006 All-Star game. A few notes before we dig in:

Apparently, calculating VORP between two seasons like this is about as easy as building a working space shuttle from common household items. So we're using an estimated VORP (which I've labeled eVORP, so nobody gets confused) instead. The cutoff for this list was 20 eVORP, so players who fell below that level are not included.
MLVr is a statistic from the VORP report that we haven't discussed yet. Basically, it's a rate-stat counterpart of VORP, just without the positional adjustments. The acronym stands for Marginal Lineup Value rate, and it is measured against the league-average, so a league-average offensive performer would have an MLVr of .000.
Just for fun, I've put in the final All-Star vote totals for those players for which it was available, in units of a thousand votes. Sadly, the vote totals for players outside of the top five per position are not published, so many players' totals were not available. I've also taken the liberty of noting All-Star starters (S) and reserves (R).


AL Catcher             PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Victor Martinez   (R)  647  .323  .394  .505   .277   61.9     1,024
Jorge Posada      (R)  574  .307  .382  .517   .249   55.2     1,643
Joe Mauer              531  .310  .406  .467   .222   46.3     1,465
Kenji Johjima          514  .294  .329  .449   .069   28.7       583
Jason Varitek          405  .271  .360  .439   .064   21.4     1,444
Ivan Rodriguez    (S)  563  .289  .314  .436   .009   20.7     2,343

As we can see, the All-Star starter was just about the worst choice available, with an awful .314 OBP over the last calendar year. The myth of Japanese ballot-stuffing seems to have hit a speed bump, considering Kenji Johjima's low vote total.


AL First Base          PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Justin Morneau    (R)  685  .316  .380  .552   .303   48.4     1,780
Mark Teixeira          592  .296  .399  .581   .338   47.6       459
Paul Konerko           626  .285  .367  .516   .188   29.7       N/A
Richie Sexson          614  .263  .350  .514   .167   25.5       392
Carlos Pena            310  .286  .392  .587   .329   25.4       N/A
Kevin Millar           498  .289  .394  .474   .195   25.2       N/A
Lyle Overbay           519  .299  .360  .498   .170   22.0       N/A
Kevin Youkilis         638  .295  .386  .446   .136   20.6       N/A

Because designated hitters are considered first basemen in this year's All-Star vote, Justin Morneau is relegated to reserve status. He has the distinction of actually being the best glove-using first baseman in the American League.


AL Second Base         PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Brian Roberts     (R)  705  .300  .372  .440   .121   49.4       529
Robinson Cano          567  .308  .339  .501   .162   36.0     1,332
Placido Polanco   (S)  512  .330  .371  .416   .114   28.2     2,318
Mark Ellis             613  .271  .340  .425   .025   22.9       N/A
Esteban German         438  .295  .390  .437   .131   22.6       N/A
Luis Castillo          630  .311  .365  .364   .009   22.0       873
Ian Kinsler            613  .255  .332  .425  -.017   21.2       N/A
Ty Wigginton           512  .285  .340  .485   .113   20.2       N/A

Both of the AL All-Stars at the keystone lose a little bit when their 2006 second-half performance is taken into consideration-Polanco because he missed a lot of playing time after the All-Star break, Roberts because his second half was sub-par. The main thing that expanding the sample does is get Robinson Cano-who has been inconsistent in 2007, but ended the season like a house on fire in 2006-back into the conversation. Nothing here explains how Polanco got more than four times as many votes as his backup.


AL Shortstop           PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Carlos Guillen    (R)  593  .336  .408  .559   .380   71.2     1,718
Derek Jeter       (S)  729  .340  .408  .484   .279   70.9     3,200
Miguel Tejada          623  .328  .381  .453   .184   48.1       840
Orlando Cabrera        684  .303  .343  .419   .054   36.5       850
Michael Young     (R)  728  .299  .345  .424   .038   31.4       469

Injuries to Miguel Tejada and Mark Teixeira left the door open for Michael Young to tag along as the Token Texas Ranger (TTR). Mixing in some of his 2006 performance makes Young seem a little less out of place on the All-Star roster.


AL Third Base          PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Alex Rodriguez    (S)  672  .309  .405  .608   .411   76.5     3,891
Adrian Beltre          613  .280  .333  .520   .165   33.7       528
Troy Glaus             520  .271  .379  .503   .182   30.0       N/A
Casey Blake            584  .271  .347  .466   .098   21.6       389
Mike Lowell       (R)  617  .281  .335  .477   .083   21.1     1,364

Predictably, Rodriguez leads the AL in homers (46), runs scored (131), RBI (142), and slugging percentage (.612) since last year's Midsummer Classic. Like Troy Glaus, Mike Lowell started off the season well, but where Glaus slumped in May, Lowell fell into a far more profound slump in June.


AL Left Field          PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Manny Ramirez     (S)  553  .307  .409  .522   .285   42.4     2,153
Carl Crawford     (R)  663  .286  .336  .440   .046   26.2       602
Juan Rivera            286  .335  .381  .538   .320   26.1       N/A
Raul Ibanez            626  .285  .340  .458   .093   20.7       N/A

Left field in the American League is, to put it bluntly, thin, and made thinner by the injury Juan Rivera sustained in winter ball. Ramirez is the sixth-best outfielder the AL has over the past year, and Carl Crawford is the token Devil Ray and AL stolen base king (49 since the 2009 All-Star game).


AL Center Field        PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Grady Sizemore    (R)  748  .286  .393  .512   .248   69.4     1,297
Ichiro Suzuki     (S)  727  .329  .376  .419   .138   50.8     2,341
Torii Hunter      (R)  601  .301  .336  .559   .230   49.5     1,717
Kenny Lofton           563  .300  .374  .419   .083   33.6       N/A
Gary Matthews Jr.      680  .287  .353  .442   .078   31.6       511
Curtis Granderson      661  .263  .317  .487   .062   31.3       N/A


AL Right Field         PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Vladimir Guerrero (S)  665  .345  .424  .579   .451   79.9     3,151
Magglio Ordonez   (S)  658  .328  .397  .526   .310   58.3     2,715
Nick Markakis          672  .292  .350  .475   .128   34.3       N/A
Michael Cuddyer        698  .285  .361  .472   .132   33.9       621
Mark Teahen            590  .296  .374  .473   .153   33.4       N/A
J.D. Drew              572  .270  .390  .459   .141   26.6       951
Alex Rios         (S)  580  .283  .333  .485   .091   25.6       316

What the AL lacks in left field over the last calendar year they more than make up for in center fielders and right fielders. Even better, all the right ones have been invited to San Francisco.


AL Designated Hitter   PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
David Ortiz       (S)  646  .309  .440  .612   .453   77.5     2,858
Travis Hafner          572  .270  .396  .532   .267   43.1       806
Jim Thome              499  .271  .425  .522   .274   41.7       N/A
Frank Thomas           635  .275  .386  .502   .198   40.5       N/A
Gary Sheffield         396  .293  .396  .538   .287   33.0     1,739

Hafner and Ortiz were on the first baseman's ballot, Sheffield on the outfield ballot, all based on the stupid rule that you don't get a DH when the game's in a National League ballpark. Whether as an first baseman, over-glorified pinch-hitter, or dog catcher, Ortiz had to be in this game. He's the league-leader in OBP (.439) in second place in homers (37) and slugging percentage (.609) over the past year.


NL Catcher             PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Brian McCann      (R)  540  .289  .340  .524   .176   43.2       972
Russell Martin    (S)  603  .291  .362  .456   .115   41.4     2,039
Paul Lo Duca           548  .304  .343  .408   .036   26.1     1,742
Bengie Molina          504  .282  .304  .462   .012   20.7     1,383
Chris Coste            189  .337  .386  .543   .315   20.1       N/A

The two best catchers are going to the game. Lo Duca's high vote total could in part be attributable to his red-hot performance in May (.393/.441/.488), which was followed by an equally impressive collapse (.214/.250/.342 since June 1).


NL First Base          PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Ryan Howard            657  .307  .449  .658   .520   76.1       851
Albert Pujols     (R)  689  .326  .418  .577   .398   64.7     1,936
Lance Berkman          653  .285  .410  .533   .280   44.4       N/A
Prince Fielder    (S)  660  .282  .379  .565   .274   43.1     2,706
Adrian Gonzalez        684  .297  .370  .506   .223   36.3       532
Todd Helton            696  .309  .415  .462   .191   30.7       N/A
Dmitri Young      (R)  429  .326  .373  .510   .256   30.0       N/A
Adam LaRoche           609  .274  .350  .532   .181   26.1       N/A
Derrek Lee        (R)  433  .325  .400  .483   .230   24.1       988

Howard's just the NL leader in homers (51), RBI (145), and slugging percentage (.655) since last year's game-no need for him to be an All-Star. Meanwhile, Dmitri Young is having a nice half-season, but that shouldn't make anyone forget that last year, he was cut by a team in the middle of a playoff run. The fact that he makes this team while Howard is on the sidelines is direct evidence that the home-field advantage conferred by winning this exhibition game is a farce.


NL Second Base         PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Chase Utley       (S)  744  .314  .392  .552   .308   74.2     2,214
Orlando Hudson    (R)  689  .304  .383  .473   .163   46.5       N/A
Ray Durham             590  .287  .351  .496   .139   38.0       672
Jeff Kent              550  .293  .375  .468   .154   33.0     1,206
Dan Uggla              746  .252  .317  .473   .047   29.5       532

Also a reserve, but not making the 20 eVORP minimum was token Pirate Freddy Sanchez (.311/.342/.402 since last year's game). But that's not really an outrage until you consider the next list�


NL Shortstop           PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Hanley Ramirez         718  .325  .379  .548   .327   82.4       456
Jose Reyes        (S)  702  .303  .371  .463   .161   60.0     2,214
Jimmy Rollins          772  .289  .334  .527   .159   59.4       834
Rafael Furcal          697  .303  .367  .453   .129   46.9     1,109
Edgar Renteria         673  .296  .356  .449   .102   41.9       847

Hanley Ramirez, the best player over the past calendar year, doesn't make it to the All-Star game, basically because of J.J. Hardy's early-season home run binge. Hardy missed the second half of last season with an injury, and his June/July slump has him below the 20 eVORP threshold this season. Still, that was enough to vote him in on the Players' Ballot. Ramirez is third in runs scored (120) and stolen bases (51), and first in doubles (50) since last year's All-Star game.


NL Third Base          PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Miguel Cabrera    (R)  671  .333  .409  .573   .418   74.5     1,808
Chipper Jones          443  .334  .420  .653   .526   62.2     1,084
Aramis Ramirez         592  .319  .372  .604   .352   59.3       899
Garrett Atkins         694  .304  .382  .532   .226   48.3
David Wright      (S)  649  .298  .374  .492   .201   48.3     2,303

No wrong choices in this group, but looking at the larger data set would probably change some minds about the fact that Aramis Ramirez belongs in this game. Over the past year, he has the third-most homers (37) and RBI (118) and the second-highest slugging percentage (.606). Of course, since the guy who came in first in all those categories isn't an All-Star either, there's no sense complaining.


NL Left Field          PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Barry Bonds       (S)  521  .296  .480  .597   .474   68.5     2,325
Matt Holliday     (R)  689  .330  .392  .582   .336   60.7     1,549
Alfonso Soriano   (R)  693  .298  .360  .552   .252   51.9     2,203
Chris Duncan           524  .289  .374  .572   .287   45.1       555
Carlos Lee        (R)  700  .303  .354  .513   .182   38.4     1,166


NL Center Field        PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Carlos Beltran    (S)  643  .266  .359  .514   .174   49.7     2,511
Mike Cameron           672  .269  .342  .482   .118   38.2
Bill Hall              609  .274  .359  .502   .150   36.9     1,177
Aaron Rowand      (R)  518  .295  .368  .457   .119   30.2       696
Hunter Pence           281  .341  .367  .581   .340   27.8


NL Right Field         PA    AVG   OBP   SLG   MLVr  eVORP     Votes
--------------------- ---- ----- ----- ----- ------ ------   -------
Ken Griffey Jr.   (S)  549  .273  .368  .518   .175   37.5     2,987
Luke Scott             497  .281  .374  .535   .223   34.1
Brad Hawpe             567  .290  .381  .505   .175   31.3
Corey Hart             477  .289  .347  .506   .151   28.0     1,099
Shane Victorino        670  .283  .348  .413   .017   24.7

If, looking at the NL outfielders you started humming "one of these things is not like the others," Aaron Rowand would probably be the guy that starts blushing. He was having a really poor second half of the season last year, hitting .257/.327/.407 prior to suffering another season-ending injury after an outfield collision in August. So, at least his All-Star story has a feel-good, "welcome back from your gruesome injury" angle.

In conclusion, it's not that there's anything magical about the period between All-Star games that makes it an ideal data set for looking at the performance of your prospective All-Stars. You could go back eighteen months or further in looking at the performance of the players you're going to designate as the league's elite. The thing to remember is that you have a choice aside from just accepting the half-season (or less) sample that's available to you in making the decision.

Finally, by way of thank yous, I couldn't have even started to put together this article without the help of BP's technical staff, who generated the eVORP and rankings for me, as well as invaluable time-twisting resources such as Dave Pinto's Day-by-Day Database, Sean Forman's wonderful and constantly improving Baseball Reference site, and, of course, Click here to see Derek's other articles. You can contact Derek by clicking here

0 comments have been left for this article.