June 14, 2016
What the #StroPoll Results Got Wrong
On Wednesday night, this image created a small Twitter sensation. Mind you, it was a small sensation. On a night that featured noteworthy pitching performances ranging from Yu Darvish’s injury to Jameson Taillon successful debut to James Shields to Snoop Dogg, there wasn’t room for a large sensation. But this screenshot during the Houston broadcast (in a game in which the Astros actually beat the Rangers in Arlington!) caused some of us to drop our slide rules in amazement:
The two most obvious reactions were:
1. Wow, people actually are starting to realize that batting average isn’t all it’s cracked up to be!
2. What in the world is a “sabermetrics?” (Rob Neyer hypothesized that it was shorthand for WAR, which probably wouldn’t have looked good on a Twitter poll anyway. If that’s the case, we should thank the Root Sports program director for sparing us the inevitable Edwin Starr references.)
Here’s a third reaction: They got it wrong. On-base percentage isn’t the “most important to a hitter’s value,” even among these options. Slugging percentage is.
I think there are two reasons for on-base percentage’s popularity. First, of course, is Moneyball. Michael Lewis demonstrated how there was a market inefficiency in valuing players with good on-base skills in 2002. The second reason is that it makes intuitive sense. You got on base, you mess with the pitcher’s windup and the fielders’ alignment, and good things can happen, scoring-wise.
To check, I looked at every team from 1913 through 2015—the entire Retrosheet era, encompassing 2,214 team-seasons. I calculated the correlation coefficient between each team’s on-base percentage and its runs per game. And, it turns out, it’s pretty high—0.890. That means, roughly, that you can explain nearly 80 percent of a team’s scoring by looking at its on-base percentage. (Square the correlation coefficient, r, and you get r2, the percentage of variation explained by a linear model.) Slugging percentage is close behind, at 0.867. Batting average, unsurprisingly, is worse (0.812), while OPS, also unsurprisingly, is better (0.944). TAv would undoubtedly be better still.
But that difference doesn’t mean that OBP>SLG is an iron rule. Take 2015, for example. The correlation coefficient between on-base percentage and runs per game for the 30 teams last year was just 0.644, compared to 0.875 for slugging percentage. Slugging won in 2014 too, 0.857-0.797. And 2013, 0.896-0.894. And 2012, and 2011, and 2010, and 2009, and every single year starting in the Moneyball season of 2002. Slugging percentage, not on-base percentage, is on a 14-year run as the best predictor of offense.
And it turns out that the choice of endpoints matter. On-base percentage has a higher correlation coefficient to scoring than slugging percentage for the period 1913-2015. But slugging percentage explains scoring better in the period 1939-2015 and every subsequent span ending in the present. Slugging percentage, not on-base percentage, is most closely linked to run scoring in modern baseball.
Let me show that graphically. I calculated the correlation coefficient between slugging percentage and scoring (i.e., runs per game), minus the correlation coefficient between on-base percentage and scoring. A positive number means that slugging percentage did a better job of explaining scoring, and a negative number means that on-base percentage did better. I looked at three-year periods (to smooth out the data) from 1913 to 2015, so on the graph below, the label 1915 represents the years 1913-1915.
A few observations:
· The Deadball years were extreme outliers. There were dilution-of-talent issues in 1914 and 1915, when the Federal League operated. World War I thinned rosters and shortened the season in 1918 and 1919. And nobody hit home runs back then. The Giants led the majors with 39 home runs in 1917. Three Blue Jays matched or beat that number last year.
· Since World War II, slugging percentage has been, pretty clearly, the more important driver of offense. Beginning with 1946-1948, there have been 68 three-year spans, and in only 19 of them (28 percent) did on-base percentage do a better job of explaining run scoring than slugging percentage.
· The one notable exception: the years 1995-1997 through 2000-2002, during which on-base percentage ruled. Ol’ Billy Beane, he knew what he was doing.
Why is this? The graph isn’t random; there are somewhat distinct periods during which either on-base percentage or slugging percentage is better correlated to scoring. What’s going on in those periods?
To try to answer that question, I ran another set of correlations, comparing the slugging percentage minus on-base percentage correlations to various per-game measures: runs, hits, home runs, doubles, triples, etc. Nothing really correlates all that well. I tossed out the five clear outliers on the left side of the graph (1913-15, 1914-16, 1915-17, 1916-18, 1917-19), and the best correlations I got were still less than 0.40. Here’s runs per game, with a correlation coefficient of -0.35. The negative correlation means that the more runs scored per game, the more on-base percentage, rather than slugging percentage, correlates to scoring.
That makes sense, I suppose. When there are a lot runs being scored—the 1930s, the Steroid Era—all you need to do is get guys on base, because the batters behind them stand a good chance of driving them in. When runs are harder to come by—Deadball II, or the current game—it’s harder to bring around a runner to score without the longball. Again, this isn’t a really strong relationship, but you can kind of see it.
So respondents to the StroPoll: Good job realizing that batting average isn’t the most important single offensive statistic! You’ve moved beyond 20th Century thinking. Now it’s time to move beyond turn-of-the-millennium as well. On-base percentage is a really useful statistic. But if you want to predict run scoring, look at slugging percentage first.