July 1, 2014
Survival of the Fittest: Pitchers
In the recent past, we’ve seen the rise of a generation of young and highly talented pitchers. From Corey Kluber to Chris Sale, from international acquisitions like Masahiro Tanaka to the sadly injured Jose Fernandez, young hurlers occupy an increasing share of the game’s best pitching matchups. Indeed, of the leaders in this year’s Cy Young race, only four of the top 10 are over 30. It’s easy to forget that even veteran aces Felix Hernandez and Johnny Cueto are still only 28. With the exception of some old stalwarts like Adam Wainwright and Mark Buerhle, the game’s best and brightest seem to be tilting toward youth.
If it seems to be the case, that’s only because it is. Younger pitchers are piling up the WAR(P) at an accelerated rate relative to the past couple of decades.
I’ve plotted here the age of the average MLB pitcher, weighted by each pitcher’s WARP, so that good pitchers count more. You’ll note that for the last decade or so, the WARP-weighted age has been in steep decline, from a peak of 29-plus at the turn of the century to a more youthful 28 at present.
In tandem with this increasing production from youngsters has been the surge of Tommy John surgeries. Paradoxically, even as the production has increased, more young pitchers have fallen victim to what is still a career-threatening injury. Most will make it back—if not all the way back to their former heights, then at least partially so.
Still, some young pitchers will burn out. As a fan of your favorite young ace, you might wonder whether he’s destined to be one of them or to enjoy a long and prolific career. I endeavor herein to answer that question, with the help of some survival modelling. This approach looks at how a player’s career length varies as a function of different characteristics (with the help of a lot of #GoryMath).
I’ve previously used this model to inquire about the cursed fortunes of aging second basemen and whether modern medicine has helped players have longer careers. Because the model relies on a large amount of historical data, I can’t (yet) integrate some of the juiciest potential inputs, like velocity (file this away for investigation in 20 or 30 years). Instead, I’ll have to rely on some other variables as proxies. Furthermore, in keeping with the spirit of the question, I’ll limit myself to examining a pitcher’s characteristics in his first three years in the league.
Let’s begin with the most obvious statistics in the Defense-Independent Pitching world, strikeouts and walks. These are the peripherals over which pitchers exercise the most control, and with the two of them in hand, one can get a basic sketch of what niche a given pitcher occupies: a high-K, high-BB hurler is a flamethrower, a low-K, low-BB player is a control artist, and a low-K, high-BB player is, well, bad.
On a technical note, I’ve normalized all the statistics to league average for each particular year, and I also include a variable for the decade in which a pitcher made his debut, as I’ve previously shown that time is an important predictor of career length. I also limited it to starting pitchers whose careers began after 1940 and lasted for more than a single year.
The model in question here (a Cox Proportional-Hazards Model) measures the risk of a player’s career ending prematurely as a function of an increasing level of each variable. The result it spits out is called a Hazard Ratio (HR), where numbers >1 indicate more risk of a premature career ending, and numbers <1 indicate less risk.
So, for instance, increasing strikeouts are strongly associated with lower risk (HR = .73), while increasing walks are associated with more risk (HR = 1.55). As an example, a pitcher who struck out two times as many hitters as average in his first three years in the league would be only 75 percent as likely to end his career after, say, 10 years than one who struck out hitters at the league-average rate. That’s a fairly startling difference. Together, strikeouts and walks—all by themselves—explain about 4.5 percent of the variation in career length.
Suppose that we added a few more peripherals into the model. Home runs are important, certainly, although their predictive value has been called into question. I’ll also consider that old standby, ERA; even though it now teeters on the edge of obsolescence, it was once an avant-garde method of pitcher evaluation, and since we are dealing with historical data, the stat could be important.
Surprisingly, HR/9 and ERA contribute very little increased accuracy to the model (R2 goes to .048). To the extent that general managers and front offices in the age before DIPS were using ERA as a way to measure pitcher performance, it’s surprising to see that it counts for very little in terms of career length. I would have imagined that, all else being equal, a pitcher who had (by luck, and not strong peripherals) put up a brilliant ERA in his first few years in the league would buy himself more career length later on. Instead, it appears that most of what’s significant for career length can be summed up in strikeouts and walks.
I’ll add another pair of variables to the mix, both of which are not direct indicators of quality, but rather indirect signals thereof. The first is the number of innings pitched in a player’s first three years. If this is low, it might indicate early struggles or injuries. Conceivably, innings pitched early in the career could also have a negative effect on pitcher health in the long term, if a promising young player overextended himself somehow. Certainly, we’ve seen evidence that modern front offices care a great deal about early-career innings limits: Witness, for example, the much-hyped shutdown of Stephen Strasburg.
The second variable I’ll add is the age at which a player debuted. While age itself has nothing to do with the quality of a pitcher, it’s conceivable that the earlier a player debuts, the more raw talent he has at his disposal. The debut age may therefore reflect an intrinsic characteristic about the pitcher that isn’t captured in the raw statistics. In addition, if nothing else, a player who debuts one year earlier has a one-year head start in terms of career length, so we expect increasing debut age to be associated with increasing Hazard Ratio.
Starting with innings pitched, we see that the more innings a pitcher throws in his first three years, the longer his predicted career length. That finding is the opposite of some modern thought on pitching injuries, but that doesn’t mean that the results are necessarily in conflict. The data I am analyzing comes from a window of time (1940-1999) in which training regimens and development schedules were very different from the modern grind. Pitchers didn’t throw as hard, and they didn’t play ball year-round beginning early in their lives.
All things considered, there’s also a good reason why increased innings pitched would be associated with longer careers in this time. Since pitchers were allowed to go more often and for longer outings, very good pitchers could rack up many, many innings. So there could be potentially opposite signals in the data: one that connects innings pitched to quality of the pitcher, and one that connects innings pitched to injury. Which signal dominates might be a function of the norms of pitcher usage at a particular time in MLB history.
If I fit an interaction term between the decade of a player’s debut and the innings pitched, then that interaction term shows a higher Hazard Ratio. This result indicates that, as expected, later in the 20th century, additional early-career innings pitched did more harm than good for a player’s expected career length. Nevertheless, the magnitude of this term is relatively small (HR = 1.13), and the data can’t cover the most recent two decades.
Age at debut is powerfully associated with decreased career length. The later a player debuts, the less likely he is to have a long career. To visualize this, consider a random player, Don Aase. Aase debuted in 1977 at the age of 23. His first three years weren’t very good; he posted a below-average ERA and strikeout rate. He wasn’t expected to last very long, but made it 13 years, even notching an All-Star appearance. Suppose he had debuted at a set of different ages, with stats that were exactly the same otherwise.
Taking his actual debut age as a baseline (HR ~ 1), you can see that a few years this way or that would substantially alter his risk. If Aase had performed identically but debuted at age 20 (when Jose Fernandez made it to the show), he would have been 30 percent less likely to end his career after those 13 years, a significant difference.
With all variables combined in the same regression, one begins to get some predictive capacity. R2 rises to a solid .177. Considering all the varied factors that shape a pitcher’s career, and the omnipresent possibility of career-crippling injury, such a level of explanatory power is solid.
All of which is not to say that there aren’t avenues of possible improvement. One possibility would be to incorporate raw pitch counts, which can be estimated for older games. Examining both the total number of pitches a player throws in his early years, as well as the most stressful outings, could be a signal of overuse more clearly than innings pitched. There are certainly other potentially interesting inputs that could also be added.
Even so, a major problem with the analysis is—and will continue to be—the ever-changing nature of baseball. While it is interesting to consider the patterns of attrition in players who debuted in the ’50s and ’60s, the factors wthat made them prone to early career endings may be very different from those that affect modern players. The relevance of these factors, as applied to the present set of players, might therefore be questionable. And of course, by the time we have enough data to analyze the patterns of modern players’ attrition, baseball will be dealing with a whole new set of pitching problems (perhaps Cyborg Arm Rejection Syndrome will be 2050’s Tommy John).
Still, a few things are made clear by the survival modelling. The most predictive peripherals for a long and prosperous career are the first three years’ worth of strikeout and walk rates. The earlier a player makes the big leagues, the longer his expected career length. Finally, the innings a young hurler throws may be connected to the quality of the pitcher, but they could also be harmful for his development and long-term career. All of these findings are quite comforting for fans of this new generation of pitching wunderkind: If all goes well, the likes of Kluber and Sale and Fernandez should stick around for a long time.