July 23, 2015
Spin That Curveball
In late June, [Jeff] Luhnow and his brain trust gathered in the general manager's box at Minute Maid Park in Houston to watch a 27-year-old pitcher whom they consider an indicator of what their process can yield. Collin McHugh was plucked from the scrap heap last December after bouncing between the Colorado Rockies and their Triple-A affiliate. – Extreme Moneyball
This was the world's introduction to a story that has mesmerized the statistics-inclined baseball world since it was revealed 11 months ago. It's a story of statistics, of innovation, of dedication to valuing the undervalued, and of persevering in the face of doubt from contemporaries and fans alike. Pitch spin has quickly become a hot topic among researchers across the sabermetric landscape. The methodology described in the Bloomberg article quoted above has spawned several attempts at replication, which, while valuable, are flawed because of the data limitations caused by traditional PITCHf/x outputs. New data sources give us the tools we need to properly replicate the Astros' methodology.
Statcast puts analyses like the ones the Astros are reported to have done within the grasp of fans and analysts like us.
The Astros' analysts noticed that McHugh had a world-class curveball. Most curves spin at about 1,500 times per minute; McHugh's spins 2,000 times. The more spin, the more the ball moves during the pitch—and the more likely batters are to miss it. Houston snapped him up. "We identified him as someone whose surface statistics might not indicate his true value," says David Stearns, the team's 29-year-old assistant general manager." – Extreme Moneyball
The Astros tweaked McHugh's approach, tasking him with throwing fewer sinkers and more high fastballs. They encouraged him to use his high-spinning curveball, and taught him one weird trick to get MLB hitters out that professional hitting coaches hated them for sharing. They used data, insight, and coaching to turn a scrap-heap player into a highly productive major-league starter.
McHugh was worth 4.3 DRA-based WARP last season, and has already accumulated another 1.1 wins this year. Considering the little risk the club incurred to pick up McHugh and it's safe to slide this one into Luhnow's win column.
There's an important nuance in understanding how to analyze pitch spin that needs to be clarified. I urge you to read (and re-read) Alan Nathan's most recent work on pitch spin. Nathan is, undoubtedly, the premier expert in the public space regarding pitch spin and the resulting impact on the ball as it travels to the plate.
PITCHf/x, as the data is reported out by MLB Advanced Media, produces what we'll call "calculated spin rate." Essentially PITCHf/x takes the total movement of the pitch, removes the effect of gravity, and then uses that data to calculate algorithmically the spin necessary to generate that movement. There are some inherent flaws in this methodology. The first is that outside factors like elevation and game time temperature greatly affect the movement of a pitch. It also does not correct for the effect of air drag on the pitch. This means that the spin rates reported in raw PITCHf/x data are generally inaccurate because they don't account for these factors in their calculation.
That said, the raw data can be re-calculated in order to get accurate spin numbers, as Nathan has discussed. It's worth noting here that Nathan uses a slightly different algorithm than MLBAM, his being derived from recent work with laboratory tests. Incorporating factors such as air density and temperature allows you to determine the useful or movement generating spin. What does that mean exactly?
In general, the spin vector can be written as the vector sum of two components. One is parallel to the direction of motion, and I will refer to it as the “gyrospin” component . The other is perpendicular to the direction of motion, and I will refer to it as the “transverse spin” component . Remember that only contributes to the movement. In general, if we can measure the movement, we can determine both the magnitude and direction of , which is exactly what PITCHf/x does. But PITCHf/x has no way to determine anything about . We may refer to the transverse spin as the “useful spin”, since it is directly related to the amount of movement, in the sense that increasing the transverse spin will increase the movement. However, increasing the gyrospin will not increase the movement. As the title says, all spin is not alike. Note that the total spin rate is the Pythagorean sum of and :
In layman's terms, the total spin of any given pitch comprises both useful or movement-generating spin ( above) and “gyrospin,” which doesn’t generate movement ( above). By doing some more robust calculations, we can begin to isolate the useful spin that actually helps the pitcher generate movement.
Statcast data helps us fill in the holes. Statcast provides us with the total spin, an observed data point, which will help us paint a more complete picture. Jeff Passan teased us with this revolution recently, highlighting that "teams have been getting all of the Statcast information in giant raw data files, to do with whatever it pleases. And some have glommed onto the spin of pitches – how many revolutions per minute it is turning – as a component worth studying. It's far too early to tell whether it will prove as valuable a marker as velocity. What's undeniable is that the spin of a ball has a demonstrable effect of how hitters react to it."
One point in the above paragraph is an important one. Statcast provides the total spin as opposed to calculated (or movement) spin data. That means that it is actually counting the revolutions on a pitch as opposed to uncovering it algorithmically. Passan was able to confirm this upon further inquiry. The value here is that not only can we determine how much a pitch spins, but we can also ascertain how much of the spin is actually causing movement on the pitch.
MLBAM graciously supplied us with the Statcast spin data for every pitcher who has thrown at least 50 curveballs and/or sliders this season. With the help of Dr. Nathan we analyzed Statcast spin data in conjunction with properly calculated PITCHf/x spin data to get a more complete picture of what spin data actually looks like. Below are two tables showcasing some high-level data from the datasets as a whole:
All the spin numbers are rounded to the nearest ten because, in reality, 10 rpm is a fraction of a rotation as a pitch travels to the plate. Hat tip to Tom Tango on that one. Also, the discrepancies in the average number of pitches thrown by the pitchers included in this study stems from the differing pitch classification methods of MLBAM, which provides the Statcast data, and Pitch Info, which provided the PITCHf/x classifications. Finally, there are some sampling error concerns due to useful spin ratios over 1.0, which is physically impossible.
While the average total spin and peak spin numbers for both pitches are nearly the same, sliders have a much lower ratio of useful spin than do curveballs. This is illustrated below in charts showcasing the full sample for both pitches, with useful spin plotted against the total spin. Note that the orange line indicates a ratio of 1.0, meaning that every single rotation on the ball is useful:
The curveball chart has a few outliers that live above the orange line where their useful spin per PITCHf/x is greater than their total spin as determined by Statcast. As I said, this is physically impossible, and thus can be interpreted as some sampling error where Statcast and PITCHf/x aren't playing nicely together. This is not unexpected: We are merging two disparate datasets.
The slider chart doesn't have any pitchers above the line, which is promising, but also likely a function of the fact that sliders have a much lower ratio of average useful spin to total spin than do curves.
According to the excerpt from the Bloomberg article included above, Collin McHugh's curveball spins at more than 2,000 rpm while the major-league average sat around 1,500 at the time. This data appears to be from PITCHf/x. (It matches PITCHf/x results, though the source of those numbers wasn't disclosed in the article so we can't know for sure.) Statcast gives us a new, more complete look at McHugh's curveball: He does in fact throw one of the fastest-spinning curveballs in the game with average spin rates over 2,500 rpm and a peak at nearly 2,800 rpm. Compare that to the MLB average for total spin of just over 2,200 and it's easy to see why the Astros might have been enamored.
That, however, isn't the whole story. McHugh is one of those outliers whose useful spin per PITCHf/x exceeds what Statcast tells us is the total spin on his curveball. While there's some error here, it would seem safe to say that McHugh isn't just among the best in the league at spinning a curveball. More importantly, McHugh is among the best in the business at throwing a curveball loaded with useful spin.
It is this particular data point, I believe, that the Astros paid specific attention to in identifying undervalued pitchers. One could argue, and I might, that curves with high ratios of useful spin are better than ones with lower useful spin ratios for several reasons.
For one thing, these pitches have close to the maximum movement possible because nearly all of their spin is of the useful variety. It would also seem to be plausible that McHugh has a propensity for better command on his curveball than most for a couple of reasons. First his average and max spin readings are fairly close, which indicates that most of his curveballs are spinning at roughly the same rate. Additionally, a high percentage of this spin is causing movement on the pitch, which again would seem to promote consistency. With consistency comes command, and with command comes the ability to throw strikes.
With that hypothesis in mind, we can use Statcast data combined with PITCHf/x data tweaked using Nathan's formula to identify who the most McHugh-like pitchers in MLB might be in 2015. Over the next few weeks we'll dive into the data and attempt to find those diamonds in the rough, just as Houston's front office was able to a few years ago.
Thanks to Alan Nathan for his help in deciphering, manipulating, and understanding the Statcast and PITCHf/x data, and for his ongoing support of this analysis.