BP Comment Quick Links
![]() | |
June 3, 2009 Checking the NumbersHoudini, Meet Jorge
In an attempt to bolster a pitching staff relying on the injury-prone Mike Hampton and John Thomson and the question marks that were Horacio Ramirez and Kyle Davies, the Atlanta Braves took a flyer on Jorge Sosa prior to the 2005 season, acquiring the righty from the (then-)Devil Rays in exchange for Nick Green. The Braves' brass hoped that Leo Mazzone could work his magic on the young flamethrower who, though lacking a proven track record, possessed raw abilities capable of charming the pants off of pitching coaches. After a few months, the project seemed to be paying dividends; Sosa sustained a shiny, under-3.00 ERA. Problems lurked beneath the surface, however, primarily in the form of the number of baserunners Sosa allowed, and that he was deriving most of his success from stranding an inordinate number of them. His teammates came to embrace the quirkiness of his success, giving him the moniker "Houdini" for his ability to escape from situations with minuscule margins of error. Sosa stranded 85.1 percent of his runners that season, a rate that ranks 28th since 1954 among pitchers with at least 100 innings in a season, and fourth place since the 1999 season. In 2006, Houdini became the pawn in a David Copperfield vanishing act; Sosa was barely able to pull his weight in a severely decimated Braves bullpen, the team tired of his performance, and he was shipped to the Cardinals at the end of July. Since then, Sosa has bounced through the Mets and Nationals organizations, continuing to serve as a precautionary tale: how peripherals contingent upon luck-based indicators straying from the mean, as well as from the pitcher's prior seasonal trends, can require due diligence and further investigation. A pitcher's Strand Rate, or LOB%, is one of several innovations from fantasy guru Ron Shandler that has since been beautified by The Hardball Times. The general formula of (H+BB+HBP-R)/(H+BB+HBP-(1.4×HR)) calculates the percentage of runners that fail to score based on events considered to be out of a pitcher's control. Shandler found that the league average tended to hover around the 72 percent mark. Since 1954, the American League has averaged 71.2 percent with a standard deviation of just 1.6 percent. The National League deviated even less so, under one percent of the time from an average of 71.7 percent. Strand Rate, much like BABIP, adds another level of granularity capable of explaining ERA fluctuations, but is it stable? Can we count on a pitcher with a solid rate one year to hold steady over the next few seasons? In the same year that Sosa took the senior circuit by storm, Dave Studeman ran a regression analysis on strand rates and found a year-to-year correlation of 0.28, roughly the same level of stability that was found to be inherent in home runs back when DIPS hit the web. Adversely, Zach Fein of Bleacher Report found next to no strength in a study of his own. Two years of data is not a large enough sample for me, especially when dealing with a potentially volatile statistic. Instead of running a standard year-to-year correlation, my statistical instrument of choice is an AR(1) Intra-Class Correlation, which I've mentioned in this space before. The ICC works similarly to other correlations, but it encompasses a larger time span; essentially, it does the same thing as a year-to-year correlation, and it also includes multiple seasons, not just two. Running the test across all qualifying pitchers from 2004-08 resulted in a correlation of 0.25, suggesting a relationship of slightly below moderate strength. Not all pitchers are created equal, however, so it stands to reason that the better performers may exhibit more control over stranding runners. After all, higher quality pitchers consistently post top-notch ERAs, a stat that happens to share a -0.78 correlation with strand rate; as one goes up, the other goes down. The R-squared also suggests that 60 percent of the variability in ERA can be attributed to stranding prowess. Following this idea, I partitioned the sample of pitchers into three different groups based on their average FIP over the five-year span. FIP correlates to LOB% at -0.39, a weaker relationship than ERA, but it serves as a better predictor of future ERA and therefore makes sense as the partitioning barometer. The average FIP of the group was 4.46, with a standard deviation of 0.63. Therefore, the three groupings will be <= 3.83 (greater than 1 SD from the mean), 3.84-4.46, and >=4.47. The results:
Type ICC R-Sq
Overall .25 .063
Good .44 .194
Medium .15 .023
Bad .16 .025
As expected, the Halladays and Johans of the world exert more influence over their strand rates. This does not automatically indicate that ace pitchers post rates that are well above average, but rather that they tend to stay more consistent to their own true talent level in this area. However, if a really solid pitcher puts up strand rates exceeding the league average, don't be so quick to write off his numbers as superficial. I also ran correlations between strand rate and balls in play, but nothing surfaced. It made intuitive sense that higher line-drive percentages would hinder the ability to leave men on, given that liners result in hits 73 percent of the time, but the correlation dropped below 0.20, as it did with both grounders and fly balls.
How about some of the best and worst rates? Of those with 100-plus innings from 1954-2008, Wes Stock produced the highest rate of stranding runners at 91.9 percent, when he posted a 2.30 ERA and 3.36 FIP for the 1964 Orioles. For starting pitchers-Stock threw 113 Name Year LOB% ERA FIP John Candelaria 1977 88.8 2.34 3.99 Doc Gooden 1985 86.9 1.53 2.17 Pedro Martinez 2000 86.6 1.74 2.13 Billy Pierce 1955 86.6 1.97 2.93 Bob Gibson 1968 86.6 1.12 2.01 And some of the worst: Name Year LOB% ERA FIP Taylor Buchholz 2006 55.8 5.89 5.18 Nelson Briles 1970 56.3 6.24 4.60 Jim Abbott 1996 56.6 7.48 6.28 Billy Muffett 1961 56.7 5.67 4.76 Zane Smith 1995 56.9 5.61 3.82 Now, we have already established that the dominant starting pitchers can control their rates much more than others, but what about elite relievers? The industry's standard argument states that relievers like Mariano Rivera are able to sustain lower BABIPs and influence other luck-based indicators more than the average relief corps member. Running the ICC among all pitchers with at least 15 relief outings from 2004-08, a very weak correlation of 0.06 emerges. Ultimately, with such small samples of stats accrued each season, many numbers, LOB% included, fluctuate. When the closers are separated from this group-and I am loosely defining closers as anyone with at least 15 saves in a season-the correlation actually drops to -0.03, effectively debunking the conventional wisdom. I understand that the rather arbitrary cutoff of 15 saves could prove problematic in that some pitchers included in the sample would be nothing more than makeshift closers, so to assuage those concerns we'll adjust the minimum to 25 saves. Increasing the number of saves also works as a selection-bias filter of sorts, since it tends to only incorporate the really solid closers; a pitcher is unlikely to close out games for an extended period of time if he remains ineffective. Unfortunately, even with the adjustment, the ICC barely rises to 0.04. In case these numbers don't make a compelling enough case, let's try one more scenario: 15 or more saves with an ERA of 3.16 or lower (3.16 was the average ERA for this group). Running the ICC across all of these pitcher seasons from 2004-08 produced a correlation of -0.25, much stronger than before but still not akin to the level of strength evident in elite starters. All of this is not designed to suggest that all elite relievers and stud starters are immune to fluctuations, but rather that the grain of salt that we take with their strand rates should be much smaller than for everyone else. On top of that, a mediocre pitcher can become an ace, as Cliff Lee seems to be doing, and someone once considered an ace, like Dontrelle Willis, can fall off the map. The goal here was merely to get across that LOB%, just like any other luck-based indicator that might affect the performance marks held near and dear to our hearts, must be treated as a support, and not as a deal breaker or as the basis of definitive claims. Most data comes with even more information beneath the surface, and understanding the roots helps to explain what can be found on stat-lines. Perhaps adrenaline that would sadly dissipate in the following year allowed Jorge Sosa to really amp up his pitch data with runners on base during 2005. Perhaps his mechanics were off course during his windup, and the flaws were automatically corrected while pitching out of the stretch. Or maybe his short-lived success really was nothing more than dumb luck, the type of performance that is bound to fall back to Earth given his lack of a track record and perceived level of talent. Regardless, the point remains that investigating causation is much more valuable than taking everything presented at face value. Different types of pitchers follow different sets of rules, and should therefore be treated... well, differently. Strand rates can be particularly stable for certain groups of hurlers, which is very important to note given the evolution of strand rates and BABIP in web-based analyses. The stats seem to begin as ideas worthy of our skepticism before they evolve into arguable supports and then end up as context-free point-makers. Let's all take a giant step back to the arguable supports stage, and try to actually understand what it is that the metric explains, and how it might potentially differ, from a regression standpoint, for certain types of pitchers.
Eric Seidman is an author of Baseball Prospectus.
|
Good stuff. I like the quick summary of existing work on the topic, and then your addition to it. A couple questions: did all three groups have roughly the sample number of IP/season per pitcher? That is, did the "Good" pitchers have more innings/season in their sample than the "Bad" pitchers? I ask bc, if so, the larger sample could account for some of the decreased volatility (vs. other pitchers) seen in their stand rates. I.e., you would be using more information for Good pitchers than you would for Bad pitchers.
On a similar note, is strand rate consistency more influenced by pitcher quality (FIP) or by pitcher consistency? That is, I would be interested in creating three buckets of pitchers: inconsistent, average, and consistent, and running the same kind of test as was run on Good, Medium and Bad. The issue is how to define "consistent." I might try using stdev of season FIP over the sample, so those Ps with the highest FIP stdev over the past several years would be labeled "Inconsistent," etc. Just trying to assess if a consistently mediocre pitcher is more likely to maintain strand rate than a mediocre pitcher who swings wildly from good to bad each season.
Thanks.
evo,
The IP/season is a good point, but I'm at work until 4 and cannot check that until around that time. I would venture a guess that yes, of course guys like Halladay, Johan, Carpenter, etc average more innings than Adam Eaton. But then it's a chicken/egg situation - is the decreased volatility because they log more innings, or do they log more innings because of the decreased volatility?
I will also look into the second paragraph, probably using stdev of FIP, finding the standard deviation of the whole group as well as the SD of the individual deviations; IE - if the SD of the whole group is something like +- 0.30, but the SD of individual SDs is +- 0.08, then we can partition the pitchers that way: SDs of 0.21 or lower would be consistent, 0.22-0.38 medium, and > 0.38 inconsistent.