March 2, 2011
How Accurate is PitchTrax?
After the last two postseasons, most baseball fans are familiar with the strike zone location graphic known as PitchTrax. Here’s an example from Game One of the 2010 American League Championship Series:
This was the first pitch from Mariano Rivera to Mitch Moreland to open the ninth inning, with the Yankees holding a one-run lead. Home plate umpire Gerry Davis called this pitch ball one, but the PitchTrax system indicated it was a borderline strike. Who was right—the umpire or the machine?
The TBS PitchTrax and its cousin Fox Trax use data exported directly from Sportvision’s PITCHf/x system. PITCHf/x uses two cameras mounted in the stadium to track each pitch. The same detailed pitch trajectory tracking data from these systems that is displayed on MLB Gameday is used by television broadcasters for their pitch location displays.
Early in the life of the PITCHf/x project, the Sportvision engineers sought to establish the accuracy of the system’s plate location data. They devised a clever experiment wherein a pitching machine launched baseballs against a foam board mounted vertically at home plate. The plate locations measured by PITCHf/x were checked against the indentations of the ball on the foam board, and Sportvision found that the accuracy of their system was within a half inch in this experiment. In an investigation of the statistical uncertainty of a number of PITCHf/x-derived parameters, such as speed and location, based upon a much larger sample of data from the PITCHf/x system installed in Toronto, Alan Nathan came to a similar conclusion about the uncertainty in the plate location.
Sportvision’s foam board test and Alan’s calculations give us a good grasp on the random error inherent in PITCHf/x plate location measurements. It is good to keep this source of uncertainty in mind, but fortunately its magnitude is small. This is a testament to sound engineering work done by Sportvision. It is no mean feat, and it makes the data extremely valuable.
However, quantifying the random measurement error doesn’t guarantee that the data is free from systematic or persistent problems. On the day that Sportvision ran the foam board test, any persistent misalignment or bias in the system was less than a half inch. But that doesn’t prove that the systems stay that well calibrated and aligned throughout several years in the field and in all 30 major league stadiums.
Ike Hall was the first analyst to observe a problem with the plate location data. At the 2008 PITCHf/x Summit, Ike’s presentation on data quality included a slide that indicated that the PITCHf/x location of the called strike zone at Dolphin Stadium in 2007 was shifted 4-5 inches toward the third base side of the plate. Ike capably addressed the general difficulties inherent in attempting to identify problems with PITCHf/x data. He offered a philosophy of dealing with measurement uncertainty, as well as some suggestions for schemes for correcting the data and a critique of Josh Kalk’s correction scheme. (Josh’s published correction methods did not include plate locations.)
One of Ike’s ideas is foundational for my attempts to detect and quantify PITCHf/x measurement errors and, ultimately, to correct them. Ike proposed the “standard candle” approach from astronomy, where a star of known luminosity and distance is used as a reference to determine the distance to a new celestial object. The difficulty of calibrating PITCHf/x data is finding a standard candle that is trustworthy and unchanging, especially for outside analysts who are examining the data after the fact and without access to the PITCHf/x camera parameters.
What can serve as our standard for plate locations? The obvious candidate is that faithful and impartial arbiter—the umpire. If umpires called the same strike zone in each game and across every ballpark, we could use that strike zone as our reference point. We could even adjust for differences between the zones of individual umpires. However, two problems arise with this approach. One is that the number of called pitches in any given game is only half the total pitches, and the sample grows even smaller when only the close pitches that would be needed to establish the edge of the zone are considered. The other problem is that, as was recently outlined in this column, the approaches of the pitcher and batter have a significant effect on the size and location of the strike zone.
In fact, we can take advantage of our knowledge that the pitcher and batter have definite and persistent trends in pitch locations. Can either serve as our standard candle? It turns out that both would probably work, though pitchers are somewhat better.
We can compare the average pitch locations for individual batters from the 2007-8 seasons to the average pitch locations for those same batters in 2009-10 (restricting the pool to include only batters who saw at least 200 pitches in both samples). Similarly, we can compare average pitch locations for pitchers who threw at least 200 pitches to left-handed batters or 200 pitches to right-handed batters in both samples. It is important to separate the pitcher data by batter handedness, since nearly all pitchers locate very differently to left-handed and right-handed batters.
Pitchers have slightly higher year-to-year repeatability in their pitch locations, with a correlation coefficient between samples of r= 0.72 for pitches to left-handed batters and r=0.65 for pitches to right-handed batters. Batters have a correlation coefficient between samples of r=0.65 for left-handed batters and r=0.59 for right-handed batters. Pitchers tend to have better game-to-game repeatability, too.
Similarly, we can compare 2007-8 to 2009-10 average pitch location in the vertical dimension for batters and pitchers. The year-to-year repeatability for batter pitch location in the vertical dimension is lower than in the horizontal dimension, at r=0.50 for left-handed batters and r=0.49 for right-handed batters. For pitchers, the year-to-year repeatability is higher in the vertical dimension, at r=0.77 when facing left-handed batters and r=0.72 against right-handed batters. In fact, it turns out that splitting the pitcher sample by batter handedness is unimportant in the vertical dimension, with a correlation coefficient of r=0.77 when combined.
So, average pitch location to batters is repeatable enough to serve as a decent standard, but since the average pitch location by pitchers is better in both vertical and horizontal dimensions, we’ll use that.
By comparing each pitcher’s average pitch location in a given game to his average pitch location from 2007-2010, we can generate an estimate of the error in the PITCHf/x plate location measurement for that game. Each pitcher’s contribution is divided into two groups by batter handedness and weighted by the number of pitches thrown. The contribution from all the pitchers in a given game is then summed, and an average PITCHf/x plate location error estimate for that game is the result.
The estimated average measurement error by stadium and year is shown in the following table:
Just over half of the stadium-seasons have a measurement error below half an inch in each dimension. When both dimensions are combined, 29 percent of the stadium seasons are below half an inch of measurement error, 67 percent have less than one inch of error on average, and 98 percent are less than two inches. The worst PITCHf/x systems for persistent plate location errors were the 2007 Metrodome (2.7 in.), 2007 Dolphin Stadium (2.4 in.), 2009 Dodger Stadium (1.9 in.), 2007 Fenway Park (1.7 in.), and 2010 Miller Park (1.7 in.).
A few of the stadiums seem to have errors that persist from season to season. For example, Milwaukee’s system has measured plate locations that are shifted about 1.5 inches toward third base for four years in a row. However, many of the errors change in direction and magnitude from year to year even within the same stadium.
The picture becomes more complex when looking within a season at the game level. For example, take Busch Stadium. The seasonal averages from St. Louis have the second-lowest error, after Safeco Field, among all 30 stadiums in the horizontal dimension, and they are very consistent from year to year. However, at the game level, the picture is somewhat different:
There is a fair amount of scatter in the estimates from game to game. Nonetheless, one can discern trends within the seasons. In both 2009 and 2010, the first two thirds of the season are shifted positive, while the last third of the season is shifted negative. To determine whether this method of error estimation has any validity, we can check the strike zone from one portion of a season against the strike zone from another portion of the season.
In 2010, the average horizontal plate location error estimate for Busch Stadium from April through August 17 was +0.93 inches. From August 18 through October, the average location error estimate was -1.24 inches. The following charts show the called pitches from each period:
The strike zone called by the umpires for right-handed batters at Busch Stadium shows a shift of about two inches toward third base from the early season to the late season in 2010. The strike calls show a similar shift for left-handed batters. The boundary between 50% balls and strikes, weighted by the number of pitches at each boundary, has shifted 1.9 inches toward third base. Compare this to the shift of 2.2 inches toward third base that was indicated by using the pitcher location as the standard reference.
There is good agreement between the strike-zone based error estimator and the pitcher-location based error estimator when comparing these two time periods. However, is umpire bias the main source of the difference in strike zones between the two time periods? There are 49 umpires in the first period of 2010 and 17 umpires in the second period. It strains credulity that all of the right-leaning umpires visited Busch Stadium early in the year and almost all of left-leaning umpires waited until late August to start visiting St. Louis one after another. The explanation most consistent with all the facts is that the PITCHf/x location measurements in St. Louis shifted by about two inches toward third base during 2010.
So, there is good evidence to believe that the pitcher location offset method is producing a fairly accurate estimate of the PITCHf/x measurement error over time periods of a month or more, but what about at the individual game level? If one believes that the longer-term trends represent reality, one can calculate the uncertainty in the error estimate by comparing the spread in individual game estimates to the longer-term trends.
On what scale should the time boundaries be drawn for the comparison? The season level is clearly too long, as we have seen in the St. Louis data. My work with estimating the error in other PITCHf/x parameters suggests that the homestand would be a good length of time, since many changes in those parameters appear to occur between homestands. The same is not clear in the plate location data, possibly due to the high level of noise in the game-level estimates for this parameter.
A weighted smoothing of seven games before and seven games after the game of interest was chosen in order roughly to mimic the homestand timeframe. Here are the results of that smoothing applied to the St. Louis data shown earlier:
The average absolute difference between the game-level estimate and the smoothed value from St. Louis 2007-2010 is 0.9 inches. Here is a histogram of the difference between the game-level values and the smoothed values, both in St. Louis and in all PITCHf/x parks from 2007-2010:
The distribution is approximately normal, with a standard deviation of 1.1 inches for both St. Louis and all parks together. This distribution does not necessarily represent the actual uncertainty in our PITCHf/x location error estimate, for at least two reasons. For one thing, when the game-level estimate detects a real change, the smoothed value takes some time to catch up with that. Thus, some of the difference between the two values reflects cases where the game-level estimate is closer to reality than the smoothed value. On the other hand, the smoothed value assumes that the pitcher-based location error method is basically correct on the time scale of a few weeks. We’ve shown that that assumption has basic validity, but to whatever extent that assumption is untrue, the smoothed value will have some error from reality.
Having established some semblance of the level of uncertainty around our game-level error estimates, let’s look at the distribution of those error estimates across games from 2007-2010, as well as the distribution of the smoothed values:
The standard deviation of the game-level error estimates is 1.4 inches, and the standard deviation of the smoothed values is 0.8 inches.
Now we can evaluate part of one of the popular myths of PITCHf/x lore—that the data from 2007 is rife with errors and unreliable. Compare the yearly progression of the standard deviation of the game-level error estimate and the percentage of games with a game-level error estimate greater than two inches or a smoothed error estimate greater than one inch:
It’s difficult to determine exactly how close to reality the smoothed value comes. Max Marchi implemented one potentially elegant solution using multilevel modeling to incorporate additional factors into the PITCHf/x location error estimate. It is encouraging that Max found high agreement with the simple pitcher-based location error estimator.
Max’s method identified one of the potential problems with the pitcher-based method at the game level. His method was able to avoid the pitfall of the pitcher-based method for the July 5, 2010, game between the Phillies and the Braves. Two pitchers, Roy Halladay and Derek Lowe, threw 201 of the 218 pitches in the game. Roy Halladay and the reliever, Jonny Venters, were close to their normal approaches to pitch location. Derek Lowe, however, departed radically from normal. Whereas Lowe is normally among the most extreme pitchers at targeting the outside part of the plate against right-handed opponents, in this game, he pounded the Braves righties inside all night long. With half of Lowe’s pitches a foot farther toward third base than normal, the overall pitcher-based PITCHf/x location error estimate for the game was -3.6 inches, when in reality it was probably closer to -1 inch, based upon the locations from the other two pitchers and the strike zone edges.
In addition, TrackMan plate location data could be deployed as a check against the PITCHf/x plate location accuracy in those stadiums where both systems are installed. I’ve investigated that approach, but the results are withheld under a non-disclosure agreement.
Another potentially fruitful approach would involve working with Sportvision in order to determine what happens to the PITCHf/x cameras to produce these errors in the first place. This author and other analysts have worked with Sportvision on a very limited basis to do this in the past, but the sustained cooperative effort required to tackle the problem has not yet eventuated.
Based upon the level of accuracy in PITCHf/x plate locations and our ability to apply corrections to the errors in these locations, what can we conclude? Most of the errors are less than an inch or two and thus probably of little effect on most batter and pitcher measurements. Errors in plate location measurements have a corresponding effect on the spin deflection measurements for pitchers, but again, errors of less than an inch or two are fairly small when most pitch clusters are five inches or more apart in the spin deflection space. One arena where the implications of PITCHf/x plate location accuracy are larger is in umpire evaluation. Errors of an inch or two are sizable when grading umpire performance, and corrected location data would be useful in that application.
It’s worth reminding ourselves, though, of the engineering reality and the incredible amount of work that has gone into getting us where we are today. As Ike Hall commented, “Sportvision has done an incredible job getting it to the current level of accuracy. Things could be far worse. Really. Most parks need only slight corrections.”
Max Marchi also hit the nail on the head: “Sportvision has been doing a terrific work in the past few years in tracking every major league pitch and we are really fortunate that it (and MLBAM) let us put our hands on that wealth of data. Thus, pointing at miscalibrations is not meant as criticizing their amazing work, but rather as a way to give something back.”
And that ALCS pitch to Moreland? Our method says it was about two inches farther inside than PitchTrax indicated. The umpire may have been correct to call it a ball after all.