BP Comment Quick Links
December 13, 2007 Schrodinger's BatInching Along
"The subject may appear an insignificant one, but we shall see that it possesses some interest, and the maxim 'de minimis lex non curat'--the law is not concerned with trifles--does not apply to science." --- Darwin penned the words above in what would be his final book, published less than a year before his death. In that book he argues that small effects--earthworms, in his case--summed over a great number of actors and a long period of time, can have large consequences. In other words--unlike for his lawyer friends--in many disciplines trifles matter, not only because resolving small but knotty puzzles is intellectually satisfying, but because the apparently inconsequential is often just that: apparently inconsequential. If one can say anything about the obsessive baseball fan, it surely is that, like a good scientist, he is concerned with what to others may look like minutiae. And so it is in that vein that, based on reader feedback, this week we'll dig back into the methodology behind the calculation of SFR--introduced last week--and address one or two points that, like Darwin's earthworms, may appear insignificant but have a measurable effect on the final outcome. I'm also delaying my original promise: we'll once again focus our attention on infielders, and outfielders will have to wait for another time. The Trifles Before we dig in, I'd like to thank everyone who offered opinions and asked questions on last week's column. While I assumed that this topic might generate some interest, I am always appreciative of the insightful suggestions I find in my inbox (and regardless of the number of responses received). I should also point out that while last week the focus was on how efficiently and quickly a credible system could be built, this time around it's all about refinements. So for those who were worried this was only a quick and dirty exercise, I hope you'll be encouraged by my desire to improve upon it. Based on those reader suggestions--and piggybacking on a couple improvements I hinted at last week--the adjustments can be discussed under two broad categories. 1. More Context. One of the obvious weaknesses of the first draft of the system was that the baseline matrix shown included balls in the "area of responsibility" defined for each position broken down only by hit type. Two of the key pieces of context missing from the table were the batter handedness and bunting. The former could be meaningful if a player fielded disproportionately more balls struck by left- or right-handed hitters, and we might guess that our expectation for how many runners reach base would differ. In the case of bunting, bunts are not fielded in the same way as normal ground balls, so especially for corner infielders in the National League, but also in general, we might expect that adding this context might change SFR totals. In order to account for this, we can simply add two attributes to our matrix. This has the effect of morphing the six row table shown last week into a 20 row table: Pos Hit B Bunt Balls Runners TB Out% TB/Runner ------------------------------------------------------------ Short G L F 5186 1963 2071 .622 1.05 Short G L T 11 11 11 .000 1.00 Short G R F 13530 4164 4566 .692 1.10 Short G R T 3 3 3 .000 1.00 ----------------------------------------------------------- Short L L F 2284 1947 1951 .148 1.00 Short L R F 3452 2937 2938 .149 1.00 ----------------------------------------------------------- Short P L F 1295 53 56 .959 1.06 Short P L T 2 2 2 .000 1.00 Short P R F 1282 36 37 .972 1.03 Short P R T 1 0 0 1.000 0.00 ----------------------------------------------------------- ----------------------------------------------------------- Third G L F 1679 454 494 .729 1.09 Third G L T 286 102 106 .643 1.04 Third G R F 10231 2485 2845 .757 1.14 Third G R T 301 111 113 .631 1.02 ----------------------------------------------------------- Third L L F 991 790 998 .203 1.26 Third L R F 2816 2290 3050 .187 1.33 ----------------------------------------------------------- Third P L F 1212 21 26 .982 1.23 Third P L T 17 1 1 .941 1.00 Third P R F 645 6 7 .991 1.24 Third P R T 12 3 3 .750 1.00 ----------------------------------------------------------- You can discern that when a ground ball is hit to the third baseman's area of responsibility, it is turned into an out more frequently when struck by a right-handed rather than a left-handed hitter. However, the tables are turned when the ball is a bunt--you can see that bunts are more successful for the offense when laid down by either hand. The same general rule applies to shortstops, where grounders are converted into outs seven percent more often when hit by a right-handed hitter. For both positions this effect is present; presumably the runner coming from the right-handed batter's box has an extra step to make on his way to first, and because a grounder pulled to the left side of the infield by a right-handed hitter will typically be one he "rolled over on," making it less likely to be hit hard. It may also be the case that considering batter handedness subtly accounts for a variation in where balls are fielded when hit by left and right-handed hitters, although I have no solid evidence to back up that bit of intuition. Looking closer, it would also appear that line drives by right-handers are more difficult for third basemen--but not shortstops--to turn into outs, and are more costly when the runner does reach. Although not shown in the table, we find the exact opposite results for grounders in the area of responsibility for second and first baseman, where right-handed hitters reach more frequently. Based on a suggestion from a reader, I eliminated from consideration for middle infielders all line drives that resulted in doubles and triples, since it is likely that any line drive within reach of a shortstop or second baseman would, if not caught, result in a single. This is not necessarily the case for corner infielders, however, and so no adjustment has been made. Finally, don't be alarmed by the small samples you see for bunts to shortstop and pop-up bunts to third. Because these percentages are only used when a ball is credited to a fielder, the percentages are applied in the frequency you see in the table. In other words, while the percentages may not be ideal because of the small sample, their impact is also very minimal in terms of the overall values the system produces. In any case, this additional context also allows us to break down a corner infielder's SFR into grounder and bunt components, as we'll see shortly. 2. A Question of Partitioning. When I discussed the concept of virtual areas of responsibility in last week's column, I noted that grounders, line drives, and popups not fielded by the player in question had to be partitioned with fielders adjacent to that fielder. That is, batted balls fielded by the left fielder are partitioned between third base and shortstop, balls hit to the center fielder are partitioned between short and second, and balls to right field are split between second base and first base. Because our focus was more on the quick and dirty effort last week, very little thought was put into how that partitioning should be done. Thanks to several readers, I reconsidered the problem this week and came up with a simple and tentative way of allocating batted balls in these shared areas of responsibility. Simply put, the system now partitions batted balls (again, those not fielded by the position we're analyzing) in the proportion that we find "in the wild" for balls we know were fielded by the positions participating in the split. So for our three areas of partitioning, the splits for ground balls in 2007 look like this:
Here we see that for ground balls hit by right-handed hitters that make it into left field, we assign 45 percent of them to the third baseman, and 55 percent to the shortstop. Since we have no way of knowing on which side of second base the center fielder was when he fielded the ball, we're content to make that one a 50/50 split (although one could argue that the distribution up the middle on such balls might follow the same distribution of balls in general, so for a right-handed hitter we would split it something like 80/20 and for a left-hander more like 70/30 to the pull side). For ground balls hit by right-handers fielded by the right-fielder, we then assign 79 percent of them to the second baseman and 21 percent to the first baseman. The chart for left-handed hitters is similar, although it's clear that lefties spray the ball a bit more; a fact confirmed by looking at BIPChart. We create similar distributions for line drives and popups, all of which are shown below: Area H B Split ---------------------------- Third/Short G L 28/72 Short/Second G L 50/50 Second/First G L 60/40 ---------------------------- Third/Short G R 45/55 Short/Second G R 50/50 Second/First G R 79/21 ---------------------------- Third/Short L L 37/63 Short/Second L L 50/50 Second/First L L 55/45 ---------------------------- Third/Short L R 50/50 Short/Second L R 50/50 Second/First L R 73/27 ---------------------------- Third/Short P L 48/52 Short/Second P L 50/50 Second/First P L 74/26 ---------------------------- Third/Short P R 33/66 Short/Second P R 50/50 Second/First P R 53/47 Using these distributions we can now assign a more accurate portion of the balls in play to each position, which will adjust the numbers of balls assigned to each player, and thus the final SFR, accordingly. One final note on this adjustment: in the previous column I mentioned that all ground balls fielded by the left fielder were assigned to both the shortstop and the third baseman. After thinking about it a bit more, I decided to use the partitioning rules discussed here, which never double counts a ball but instead, like Solomon, divides it and assigns partial responsibility to each fielder. This has two effects. First, you'll notice that the numbers of balls assigned to most fielders have gone down from what was shown last week. For example, Troy Tulowitzki went from 1,179 balls last week to 974 balls after making these adjustments. Second, the Out% shown in the first table above is higher when aggregated than for the matrix shown last week, since all of the balls that were removed were hits. As an aside, most of the development effort this week was spent on creating the distributions and applying them while not inflating the code base. It remains, at present, at about 1,000 lines of code. Taking a Test Drive With those adjustments made, I then re-ran the numbers for 2005 through 2007 and re-ran the regressions against UZR for all infielders except catchers and pitchers. The resulting correlation coefficients for the 988 player seasons are shown in the following table: Position r ---------------- All .781 1B .649 2B .780 3B .776 SS .818 Overall, we've gone from an r of .75 to .78, and you can see that the correlations are pretty good for all positions except first base. I'll admit that I don't understand exactly why that might be the case. In addition to a higher correlation, these adjustments also bring SFR in line with UZR in terms of standard deviation and range in this particular data set. Previously, the spread for SFR was over a run higher, while now it's less than a half-run less, and although SFR recorded a +38 for Adam Everett in 2006 while UZR has him at +48, the other extreme high and low values match up rather nicely. Although it is difficult to discern, the scatter plot below does indeed look just a little tighter than in the previous column:
At the suggestion of readers, I also ran the numbers once excluding line drives, and once excluding both popups and line drives; the results were very interesting: With Line Drives Removed Position r ---------------- All .817 1B .658 2B .800 3B .815 SS .863 With Line Drives and Popups Removed Position r ---------------- All .821 1B .666 2B .810 3B .827 SS .860 While these correlations are indeed higher, in both cases the standard deviation drops to a run less than UZR, while the overall range constricts slightly. What this probably indicates is that line drives fielded by outfielders have little value, and therefore we should be considering only those line drives fielded by the infielder. The current algorithm would have to be adjusted to do so, but this will certainly be on the list of adjustments to try on the next pass. Popups also seem to have more relevance for shortstops than the other positions (at least in terms of correlating to UZR). Finally, here are the recalculated 2007 leaders and trailers using the adjustments discussed in this article; as in the last article, they are shown alongside the Davenport FRAA: First Basemen 2007 Name AdjG Balls FRAA SFR Albert Pujols 149.6 464 22 12 Todd Helton 148.2 419 10 11 Justin Morneau 142.0 379 14 11 Kevin Youkilis 123.2 329 17 8 Casey Kotchman 116.6 303 9 7 Lyle Overbay 108.8 298 5 6 Adam LaRoche 145.8 381 -1 5 Mark Teixeira 123.5 336 -7 5 Darin Erstad 19.5 62 1 3 Craig Wilson 15.7 45 -1 3 ------------------------------------------ Jeff Conine 56.8 137 2 -5 Ryan Garko 118.2 303 -1 -7 Dmitri Young 99.1 240 -7 -7 Mike Jacobs 101.4 259 -11 -8 Prince Fielder 150.1 392 -15 -9 Albert Pujols still comes out on top, but the adjustments to the system severely limit the number of balls in his--and all first basemen's--area of responsibility, and thus the range of SFR is now lower. As noted above, first base has the lowest correlation, so there is still likely a factor missing from the current algorithm. The above table includes bunts, but with bunts now factored into the equation we can show the leaders and trailers when fielding bunts as well: First Basemen on Bunts 2007 Name AdjG Balls SFR Adrian Gonzalez 160.6 26 2 Todd Helton 148.2 19 2 Mike Jacobs 101.4 13 1 ----------------------------------- Ryan Garko 118.2 15 -2 Ryan Klesko 89.7 25 -2 Adam LaRoche 145.8 21 -2 Certainly, Todd Helton has a good reputation in this regard, while Ryan Klesko does not. Second Basemen 2007 Name AdjG Balls FRAA SFR Aaron Hill 157.7 900 0 27 Mark Ellis 147.9 867 27 27 Kazuo Matsui 95.6 527 14 23 Dustin Pedroia 128.6 660 2 12 Brian Roberts 149.7 799 1 11 Robinson Cano 157.3 871 26 10 Geoff Blum 54.3 288 -2 9 Marcus Giles 104.1 602 6 9 Ian Kinsler 128.8 758 3 9 Alex Cora 33.5 162 1 7 ------------------------------------------ Brandon Phillips 153.3 840 15 -12 Craig Biggio 103.6 500 -17 -13 Freddy Sanchez 142.4 688 -7 -17 Rickie Weeks 110.4 559 -13 -21 Dan Uggla 155.3 814 14 -31 Aaron Hill remains somewhat of a mystery between the two systems, while Mark Ellis is spot-on. Dan Uggla and Rickie Weeks remain on the bottom, although Brandon Phillips looks far worse in SFR than in FRAA. Shortstops 2007 Name AdjG Balls FRAA SFR Omar Vizquel 135.9 774 10 35 Troy Tulowitzki 152.3 974 24 21 Khalil Greene 153.3 832 -8 17 Jason Bartlett 134.7 764 8 14 Jimmy Rollins 160.1 875 8 11 Jose Reyes 159.8 817 4 10 John McDonald 89.4 504 12 10 Tony Pena 143.6 804 12 8 Adam Everett 59.3 339 4 7 Ryan Theriot 96.2 495 -7 6 ------------------------------------------ Josh Wilson 51.3 265 -11 -11 Carlos Guillen 120.3 691 -12 -14 Derek Jeter 147.4 773 -6 -16 Hanley Ramirez 146.1 826 -8 -18 Brendan Harris 85.2 462 -13 -19 Khalil Greene gets knocked down a peg from last week (although the systems still disagree) while both Omar Vizquel and Troy Tulowitzki lose a few runs as well. Brendan Harris now takes over the bottom spot while Hanley Ramirez looks six runs better and Derek Jeter treads water. Third Basemen 2007 Name AdjG Balls FRAA SFR Pedro Feliz 136.1 521 14 25 Aramis Ramirez 122.2 486 17 13 Scott Rolen 105.6 421 16 13 Ryan Zimmerman 160.3 655 21 11 Mike Lowell 149.2 513 14 10 Nick Punto 93.4 333 1 9 Joe Crede 43.7 176 8 8 Mark DeRosa 32.2 119 2 7 Chipper Jones 120.3 414 1 7 Abraham Nunez 66.0 300 8 7 ------------------------------------------ Wilson Betemit 45.7 156 -5 -6 Edwin Encarnacion 130.6 486 -11 -6 Alex Gordon 128.0 491 0 -7 Miguel Cabrera 147.2 543 5 -9 Ryan Braun 106.1 378 -25 -31 The top third basemen stay relatively constant, although Aramis Ramirez moves up a few spots since David Wright falls off the chart: he goes from +14 last week to +6 this week, in line with his FRAA of +5. Under these new rules Ryan Braun picks up 9 runs to only end up at -31, while Garret Atkins goes from -12 to -5 and is no longer among the trailers. Third Basemen on Bunts 2007 Name AdjG Balls SFR Pedro Feliz 136.1 20 4 Miguel Cabrera 147.2 26 3 Chipper Jones 120.3 19 3 ---------------------------------- Troy Glaus 103.8 7 -3 Jose Castillo 28.4 9 -3 In looking at bunts only, it's interesting to see Miguel Cabrera place second when his overall number (-9) is so low. An ability to field bunts appears to be what catapulted Chipper Jones into the overall leaders as well. Before closing, let's make one more comparison. Since the correlation between SFR and FRAA is fairly low--as documented last week--I thought it would be appropriate to get a feel for how SFR looks when compared to UZR by showing the top and bottom fifteen SFR scores from 2005 and 2006, and their UZR equivalents. Name Year Pos Balls UZR SFR Adam Everett 2006 6 763 48 38 Rafael Furcal 2005 6 849 18 25 Jack Wilson 2005 6 864 15 24 Craig Counsell 2005 4 773 26 22 Jamey Carroll 2006 4 615 25 22 Mark Grudzielanek 2005 4 678 22 22 Brandon Inge 2006 5 701 20 21 Mark Ellis 2005 4 568 17 20 Chase Utley 2005 4 701 21 20 Jose Valentin 2006 4 480 17 19 Mike Lowell 2006 5 615 20 18 Eric Chavez 2005 5 546 10 18 Joe Crede 2006 5 602 18 17 Juan Uribe 2005 6 780 16 17 Bobby Crosby 2005 6 428 7 17 ------------------------------------------------------- Edgar Renteria 2005 6 773 -14 -14 Carlos Delgado 2005 3 381 -12 -14 Rickie Weeks 2005 4 489 -21 -14 Jorge Cantu 2005 5 194 -20 -15 Jorge Cantu 2005 4 382 -10 -15 Russ Adams 2005 6 653 -22 -15 Angel Berroa 2006 6 742 -14 -16 Felipe Lopez 2005 6 740 -5 -16 Jose Lopez 2006 4 818 6 -16 Mark Teahen 2005 5 518 -22 -18 Alfonso Soriano 2005 4 839 -15 -20 Angel Berroa 2005 6 889 -16 -22 Jose Castillo 2006 4 703 -11 -22 Jorge Cantu 2006 4 526 -25 -27 Michael Young 2005 6 872 -30 -33 Here you can see the strong correlation, as both systems peg the same players as defenders who excel and those who... well, don't. Jorge Cantu has the distinction of making the trailers list three times in two seasons thanks to being awful at both second and third in 2005. Baby Steps Like Darwin's earthworms, the wheels of progress sometimes move slowly. But as Darwin showed, small steps can have a powerful cumulative effect in the long run. Thanks again for marking out a path for some of those steps. 0 comments have been left for this article.
|