Baseball Prospectus - Chat with Jonathan Judge

Chat: Jonathan Judge

Welcome to Baseball Prospectus' Tuesday December 04, 2018 9:00 PM ET chat session with Jonathan Judge.

Jonathan is the lead statistical developer at Baseball Prospectus, and the driving force behind our new DRC+ metric.

Jonathan Judge: Hi Everybody! Thanks for joining us.

jmor1717 (Right near da beach): When sorting DRC+ leaders, is it possible to see 3 year trends?

Jonathan Judge: In terms of individual players, you can certainly see their trends on their player card. In terms of grouping by collections of seasons at one time, we don't offer that on the site at the moment although you could certainly do that on your own if you downloaded a few seasons of the data.

Al (DRC): How should we treat minor league DRC+? Is it scaled to what a minor leaguer would be expected to do at the major league level, or it’s just another way to evaluate minor league performance?

Jonathan Judge: Right now it is only scaled to the performance of others at the same league, without considering major league equivalent or age of the minor leaguer. We've been talking about trying to incorporate those, although it is tough to figure out how. Perhaps create yet another metric that is a DRC equivalent.

Tyler (Columbus, OH): Hi Jonathan, I am, admittedly, a novice when it comes to analytics. But I love learning about them and studying them. However, I get confused by some of the language used to describe the meaning of certain stats. Could you please explain in layman's terms the difference between what DRC+ and RE24 mean when evaluating a player? Thanks!

Jonathan Judge: Hi Tyler, Sure. RE24 is just the raw difference in base-out states between the start and conclusion of the batter's performance. It's a precise way of accounting for how the run expectancy changed as a result of a PA. However, it has no context. DRC+ is based on linear weights (which is the average effect of various hitting events, not the base-out states per se) and contains contextual adjustments for factors in each play. I believe that OpenWAR uses RE24 for its outputs.

Ricky (18 and life): Is Kyle Freeland for real?

Jonathan Judge: Well no, in the sense that his results this year were otherworldly. Can he keep that up next year? I honestly don't know but lots of people will be watching, which is not usually the case for a Rockies starter.

Duhbear (Vermont): Are the components for DRC available to make it as a minor league stat, as well?

Jonathan Judge: Yes, and we are actually calculating DRC for the minors also right now; essentially a beta of sorts, it can be found if you poke around the sortables a bit. No formal release about it for the moment though.

Aaron (MA): Will you be sharing your R code? Do you model each potential outcome (1B/2B/3B/HR etc) separately?

Jonathan Judge: I will share portions of it, enough so that a person who really wanted to take a shot at recreating DRC+ themselves, or something like it, could get the general idea and follow along with the statistical logic.

Bill Thompson (Milwaukee): With all the adjustments built into DRC+, does it adjust value for triples based on speed versus outfielder misplays? Basically, does a guy like Javy Baez get more credit for a triple to the gap than a guy like Anthony Rizzo would for a misplayed triple?

Jonathan Judge: No, at the moment we treat events the way they were officially scored. The reason that works is because we have over 180,000 batting events each year, and the fluky ones have a way of washing out with the volume, particularly with the skeptical way we allocate credit to batters.

someguynamedkenn (New Jersey): "Hey, I was just wondering whether you were going to release the underlying formula for DRC+? Is so when? If not, are you going to release any more in-depth explanations about the inner workings of DRC+ and how the different values are derived? #FREENIMMO

Jonathan Judge: Yes. Starting next week I will begin showing code and walking through some of the logic the system follows, which, given its unusually good results, does some interesting things (if I don't say so myself).

Michael McGowan (Canonsburg, PA): Thanks for all of this hard work. I generally agree with your decision to use mixed-models and to incorporate additional layers of context in plate appearances beyond what the publicly available metrics to date have generally done. This is similar to my own approach that I've been working on privately, and it's encouraging that these types of techniques can be so fruitful. I am, however, concerned that you are accidentally overfitting some of this data. I know how easy it can be to accidentally let information about the future slip into the data you use to build your model. You asserted via Twitter that this did not happen in this case (https://twitter.com/bachlaw/status/1069718168535482368), but could you please expound upon that a bit more? For example, when evaluating the predictiveness in year N+1, what steps did you take to ensure that only data from year N and before went into that prediction? My other main concern is with your decision to focus on single-year park factors rather than multi-year park factors. Could you please discuss a bit more about why you feel that decision works best? Is this an ideological decision or did you try both and concluded that single-year produces better results? I think you correctly point out that your technique will be more sensitive to drastic environmental changes from year to year that we may have experienced in the recent past. However, I don't think that's a good reason to throw out past data entirely...perhaps a compromise instead is in order with multi-year park factors that can adjust more quickly? Having said that, this exciting work. I look forward to reading as much as I can about the inner workings of this model.

Jonathan Judge: Hi, so each season is always modeled individually, so there is no way that data from some other season can be infecting the season in question. To evaluate reliability and predictiveness, we execute an R function that separates each season, performs an inner join on all consecutive seasons, joins all of those consecutive comparisons together, and then correlates column A with column B, controlling for average PAs over both seasons. Everything is kept 100% separate and it is 100% out of sample when being tested.

As for the park factors, to be clear, I would call ours park "ratings" not park factors, because the latter, as commonly employed, tend to be inputs not outputs. Single-year park factors seem to work well if you apply ridge / multilevel penalties to the stadium/handedness combinations to track their effects while being conservative in how you give out credit.

Pmammino (NJ): Can you briefly go over your process in creating a new metric? I guess the life cycle of your statistical process

Jonathan Judge: This is a fun question. It consistently starts with becoming curious about something. Then I will usually do some preliminary basic modeling, and if I think I may be on to something, discuss it with my BP colleagues. Once I am convinced I am on to something, the modeling will get much more rigorous, choosing variables based on effect on out-of-sample performance and usually using Stan or other software to provide good quality control. Everything gets peer reviewed here A LOT before it gets released. That's generally how it works.

Rich (Willow Springs): Is Adelberto Mondesi a star?

Jonathan Judge: A star at what is the question. Parts of the batting line are encouraging and he put out a DRC+ of 98 last year, which I suspect is pretty darn good for a shortstop. On the other hand, he hardly takes any walks at all, and without a truly extraordinary hit tool, that almost never works. So I would be skeptical, unfortunately, although the team has plenty of time to see how he does with more opportunities.

Tigers Fan (Detroit): Please give me something to look forward to this year.

Jonathan Judge: At times, Matt Moore has been fun to watch.

Kurt (New Lenox): My friend said I'm a hypocrite because I demand people take their shoes off in my home, but I don't take mine off in other homes unless I'm asked to. Is he right?

Jonathan Judge: Probably! Marriage made me a shoe-remover and I reflexively do it everywhere. Glad we can address this critical modeling issue.

Buddy (Peoria): Your process sounds similar to how I choose which beer to drink. It starts with me becoming curious. Then I do some preliminary drinking. If I think I may be on to something, I drink some more.

Jonathan Judge: Buddy,,,,, thanks. That formula is tried and true.

rockon12 (South Dakota): How will TJ Surgery affecct Ohtani and his batting skills

Jonathan Judge: That's a good question for somebody whose studied this more closely; most of the people receiving Tommy John struggle to have good batting skills when they are perfectly healthy. If there is any aftereffect on batting ability, such as it is, I haven't seen any studies on it (which does not mean it isn't possible.)

jmor1717 (right near da beach): Does DRC+ in anyway incorporate speed, positive or negative? In the articles explaining the metric, it mentions BABIP as a factor in the equation - to what extent does DRC+ hurt those with regularly high BABIP due to speed factors - would it assume a certain number of singles "weren't deserved?"

Jonathan Judge: I wouldn't say BABIP is an input; but DRC+ does try to tease out the likelihood of whether a high BABIP (e.g., on singles) makes sense or not. The more PAs you have, the more likely your higher BABIP will be viewed as real. And extraordinary BABIPs with lots of PAs will still be substantially respected. I think that concept makes sense to everyone but crediting it in a systematic, consistent way is one benefit of a formal model. We do use attempt_rate (adjusted) as one input on a few models to try and account for speed.

Kevin (Chanahon): What kind of pitcher will Mitch Keller turn out to be?

Jonathan Judge: Stated that broadly, who knows. DRA likes his AAA campaign a lot and those are some nice peripherals, especially if the walk numbers can go back down. Don't be surprised if he gets slapped around a bit after a call-up as he learns how to pitch to major league hitters. It's a typical rite of passage.

Kevin (Chanahon): I got rear-ended today, there's some red paint and small dents on my minivan. It ruined my wife's birthday. Make me feel better please

Jonathan Judge: Well DRC+ got released yesterday and it is pretty swell. Very sorry about the rest of this, which sounds kinda shitty.

Matt W (Lombard, IL): From my experience, I’ve noticed with most advanced metrics that they’re great from a predictive standpoint of projecting a regression one way or another within 1-2 standard deviations, but I’ve rarely seen them see something completely unforesee, like a 4-5 sigma season to the positive or negative. Does DRC+ have any added potential from a predictive standpoint to spot a break out season or a disastrous season for a player?

Jonathan Judge: DRC+ really is intended to be backward-looking. Its predictiveness and reliability is used to highlight its accuracy, not to hold it out as some sort of projection tool in and of itself. We may move from SD to percentiles at some point for DRC+, perhaps if computing power gets a little better. Our PECOTA projections this year should be offering very rigorous percentiles that can help show upside and downside. For Vlad Jr., for example, his 80th and 90th percentile outcomes are spread out much, much higher than the spread for most players.

Zamzow (Sunset Grill): I've been getting a lot of offers on Dylan Bundy in my dynasty league. Give me some advice on how I can send these crazies off. He's untouchable

Jonathan Judge: Have you considered marking their emails as spam? I will defer to our fantasy team on Bundy's long-term desirability, and naturally encourage you to get a subscription level that allows you to invoke the bat signal and ask this very question.

Al (I love DRC): Can we use DRC+ to more accurately project future major league players using their minor league DRC+ numbers?

Jonathan Judge: I would like to think so. Right now, for PECOTA our focus is more on projecting the individual components DRC anticipates for minor leaguers and rebuilding DRC+ from there for the majors. To be honest, though, much of the success when it comes to projections tends to be playing time, which is pretty tricky to project in its own right.

The Fonz (Milwaukee): It really sucks riding a motorcycle in the winter.

Jonathan Judge: Seems pretty dangerous to me in summertime also tbh.

Jay R (NYC): How useful do you envision DRC+ being in evaluating minor leaguers?

Jonathan Judge: I think it depends what you are looking at. Evaluating them by how stable each component is at each level seems like a plus. Shrinking performances to the mean with smaller sample sizes also seems like a plus. I would think DRC+ would be good to know for any hitting prospect on your mind, provided you keep your eyes on what evaluators are saying and be mindful of confounders like age and repeating of levels.

Dusty (Colorado): Does Wander Javier project to break the DRC+ metric? What's his upside?

Jonathan Judge: If Bonds couldn't break it (and he hasn't) I'm pretty sure DRC+ can handle young Wander. However, he's off to a great start and will be fun to watch as he develops.

Jonathan Judge: Folks, I think that just about does it on the questions I have the insight, or at least the gall to answer. See you later and thanks for your interest.