Rob Arthur has made us rethink everything from our forecasting tools to plate discipline to the analytical value of the crack of the bat. Ask him about any of that, or what he's working on next.
Rob Arthur: Hello and welcome to the chat on this fine winter day. Standard disclaimer: I know nothing of prospects and fantasy, so ask me about these things at your own peril.
Julien (Montreal): where do you get your pitch fx data for studies?
Rob Arthur: At BP, we have a database that Harry Pavlidis and Dan Brooks maintain (the Pitch Info data, with Harry's special classifications). In my pre-BP days, I used the R package pitchRx (by Carson Sievert). Jeff Zimmerman also maintains a database in SQL format at his website (baseballheatmaps.com)
Patrick (Boston): What's your take on tanking to win?
Rob Arthur: I don't enjoy it as a fan, but I understand why teams do it. They are taking advantage of the incentives available to build championship quality teams, and so I don't begrudge them the tanking. That said, it'd be lovely if there was a way to discourage it (without causing a host of side effects, that is).
Jonas (Portland): I was really interested in your column about projection systems being tightly clustered and not all that accurate, all things considered. Do you think this invalidates the projection industry? What do we get out of them, and what do you think we should get out of them in a perfect world?
Rob Arthur: To your first question, no, absolutely not-I don't think it invalidates projections at all. Even though they aren't super-accurate and they are all tightly clustered, they are MUCH, MUCH better than guessing league average for every player (or some ad hoc guesses that one might do). In retrospect, I should have been more careful to emphasize this point. I really meant to say that we have a long way to go in terms of improving the projections.
We get out of them decent guesses, seasoned with historical accuracy and a lot of regression to the mean. What we should get out of them are better guesses, and the ability to more quickly determine when a player's ability has truly changed. I am definitely in the camp that player abilities do change on a year-to-year basis, but our systems are only crudely capable of telling when. Albert Pujols is an example of this, as is someone like Michael Brantley.
Humorous (Everywhere): Is there any hope for Jon Singleton??
Rob Arthur: Definitely hope for Jon Singleton. People are too quick to write off prospects after they struggle a little in their first exposure to the major leagues. In Singleton's case, he shows good fundamentals: pitchers throw him fewer strikes, and he has decent plate discipline. He was stung last year by terrible BABIP (in the .230s I believe), and that should go back upwards. In short, I am bullish on Singleton: although he may never be a great player, he should be at least a good hitter for a few years.
Maholmie (Denver): What do you make of the Rockies ... i.e. any chance at all they compete?
Rob Arthur: I am sad to say that I think there is little chance that they will compete (I'd put it below 5%, but you can't predict ball). Part of the problem is that their division is quite good. Part of it is that I think their front office hasn't shown a coherent plan for competing this year, and I doubt that they would go all-in to get the pieces necessary to match up with teams like LA and even now SD. But I would love to be proven wrong--they are fun to watch when doing well.
caseyj15 (Medford, OR): What role go you see Mookie Betts in this year? Now that Shane Victorino has been declared the starting RF it limits his starting options.
Rob Arthur: I think he will be a starter before long, if not opening day. A lot of spring training position battles fizzle out after one player gets hurt or proves ineffective, so even though it looks like a jam in Boston's outfield, I wouldn't be surprised if either Victorino or Hanley misses significant time.
allangustafson (san diego ): How do go about analyzing Billy Hamilton? He obviously is working the staff to increase his walks.
Rob Arthur: In terms of how to analyze him, I think he will put together a good year; despite his second-half struggles, his underlying plate discipline and swing probability numbers didn't change. In fact, if anything, pitchers were avoiding him a little more. He's about league-average plate discipline-wise, by my numbers (https://twitter.com/No_Little_Plans/status/572818089580089344).
P.S. 'Working the staff' is a great phrase.
Cal Guy (Cal): Hi Rob, When analyzing bat exit speed, is any consideration given to the pitch? It stands to reason that a higher velocity pitch would contribute more to exit speed than a slower pitch just due to the rebound off the bat.
Rob Arthur: Yes indeed. The pitch of the resulting bat crack sound is a function of the velocity at which the air is expelled from the space between the bat and the ball (in other words, the relative difference in velocity between bat and ball). So fastballs produce higher-sounding cracks than curveballs. That's an effect I've noticed in my data.
As to whether fastballs would produce actually higher exit velocities, I think they do, although Alan Nathan would be the ideal person to check on this with.
Alex (Anaheim): Have you learned anything new about park effects recently?
Rob Arthur: No, actually, and it's bugging me. I would love to see someone do an analysis of the ways parks influence more of the basic, underlying statistics (as opposed to just batted ball outcomes and such). It's on my list of topics, though, so hopefully I'll get a chance to work on it some time.
brentdaily (colorado): The Cubs and Astros have taken on a high volatility approach to run scoring. Of the 'four' (we could sub-divide further) team philosophies, which do you think is best? Has there been any research done on volatility? Given the importance of sequencing it seems like there needs to be an answer here.
High Vol Run Scoring + Low Vol Run Prevention / High Vol RS + High Vol RP / Low Vol RS + Low Vol RP / Low Vol RS + High Vol RP
Rob Arthur: I haven't seen much research done on volatility, and I think that's because it is a second-order effect, i.e. not terribly important. Which doesn't mean that it's unimportant, just not a major concern.
In terms of the four philosophies you outlined, an important consideration is the skill of the offense/pitching. Bad offenses want to be more volatile; good offenses want to be less volatile, and the same logic applies to the other side of the game. Beyond that, I don't really know which one is best.
I suspect, but have no evidence, that the Cubs and Astros pursued their offensive strategies on the basis of the availability of players, not because volatility is inherently what they were aiming for. The economics of acquiring undervalued players is probably a much more significant concern than the volatility of the offense. But I'd love to be proven wrong on that evidence-less statement.
RelDel (Miami): Is this the year Stephen Strasburg will challenge for a Cy Young award?
Rob Arthur: Sure, this year, next year, the year after... as long as he can throw as hard as he can and muster up a few decent secondary pitches, he has a shot at the Cy Young. I have no special knowledge that would predispose me to believe that this year will be any different.
The thing is, he's a really good pitcher. Beyond having that one characteristic, there's no other special talent that allows one to win a Cy Young, besides a healthy dose of luck.
oxpaulo (WV): Everyone seems to be down on Arismendy Alcantara, where do you stand with regards to his future outlook? Thanks.
Rob Arthur: This is rapidly emerging as a theme of the chat, but: people are too quick to dismiss good young players when they don't perform immediately. Alcantara is projected by PECOTA for half a WARP in half a season's worth of at-bats, and I think that's about right. I think he's got a shot at being an average-ish player in the long term (i.e. two or three years from now), despite his sub-replacement level initial campaign.
Julien (Montreal): Hi Rob, I was wondering how and where you created your first graph in your latest article "Building the strike zone
Rob Arthur: Ah, that's easy. R code goes like this:
hit <- subset(test, count=="3-0"&wcount>.2&wcount<.4)
pit <- subset(test, count=="0-2"&wcount>.2&wcount<.4)
#neut <- subset(test, count=="0-0"&wcount>.2&wcount<.4)
png("Effect_of_Count.png", height=1080, width=1920)
plot(hit$px, hit$pz, pch=16,
col=rgb(0,0,1,.8), xlim=c(-1.5,1.5), ylim=c(1,4),
xlab="Horizontal Location", ylab="Vertical Location")
points(pit$px, pit$pz, pch=16,
Whatissidgraphsanyway (Your Chat): Best player of the bunch for the next 5 yrs: Nick Franklin, Brad Miller, or Chris Taylor?
Rob Arthur: Cautiously I will say Chris Taylor, based on PECOTA leavened with my own intuition. But I will be honest, I have no idea.
Whatissidgraphsanyway (Your Chat): Any picks for Surprise team?
Rob Arthur: Depends on who is being surprised! I think the mainstream folks, and even some saber-savvy fans, will be surprised that the Red Sox are going to be quite good again this year (I think). We were all misled by last year's terrible results and ensuing sell-off, and we forgot that the Red Sox are usually very good, and easily capable of buying at the trade deadline to ensure that they make the playoffs.
Whatissidgraphsanyway (Your Chat): Inspired by Ben and Sam's book .... if you could implement some different strategies for a team what would they be?
Rob Arthur: Yes, I am so excited for this book. Important thing though: Ben and Sam are managing an independent league team, so they have a lot more latitude to try crazy strategies.
The problem is that a lot of strategies just wouldn't fly in the majors, because highly-paid players will refuse to go along with you (and have substantial leverage). That's all kind of skirting your question though.
To answer your question, I would definitely mess around with the way rotations are constructed. I think I would try piggybacking the way some teams do in the minors. I suspect that there's a real chance it could be effective in MLB, but *if and only if* you could get buy-in from the players. That's never going to happen though.
NightmareRec0n (Boston): Isn't bat speed all that matters regarding power? We have to assume pitch speed is a relative constant and that bat weight isn't radically different, so doesn't it all come down to bat speed in the end?
Rob Arthur: No, bat speed isn't the only thing. Quality of contact--where on the bat the collision happens--is a really important aspect of power as well, and can be affected by everything from plate discipline to the hitter's neurological attributes (hand-eye coordination, etc.). Powerful hitters who get under the ball will get flies, which are more likely to become home runs and extra base hits. Powerful hitters who get too far under the ball will suffer low BABIP (high launch angle, infield flies and such). Any kind of hitter who gets over the ball will get grounders, which aren't likely to go for extra bases.
Ronson (On the Scratches): Your take on Steven Souza?
Rob Arthur: I think PECOTA is overoptimistic on Souza. I've got some research in the works on prospects, and I find that age is (as usual) a very important determinant of how well they do in the long term. It's not impossible that PECOTA is right and I am wrong, though--he could be the exception and not the rule.
William (New York): It was interesting to see how happy each team's fans were, but it would be really interesting to see how much fluctuation there is for each team's fans throughout the season. Have you thought about doing weekly happiness updates so we have something to track, a la playoff odds?
Rob Arthur: I am concerned that weekly might be overkill, but, I will be doing updates throughout the year (probably one at midseason, one at the end of the season?). One unanticipated problem with doing the updates too often is that folks on the subreddits are already trying to influence my metrics by using more positive words (I kid you not, and I have no idea why). If I made it a weekly thing, I fear that I would remind them and cause people to starting gaming the system even more than they already have.
Brian (Fort Worth): What is the trade value of a player like Trevor Bauer?
Rob Arthur: Low right now, and I think it would be a bad idea to trade him any time soon. He's clearly got exceptional potential, and could put it together to become a great pitcher, but we haven't seen that yet (at least, not more than flashes of it). Far better to wait until he puts up a superlative year or two, then trade him while the value is higher.
John (CT): Who do you think has the best chance to raise his game in 2015 and impress us all?
Rob Arthur: Since you guys have me thinking about it, probably one of the underperforming recent prospects from last year: Bogaerts, Alcantara, Baez, Singleton, someone along those lines. If you pressed me to pick one, I think I'd pick Bogaerts. Recency bias is real, pernicious, and unhealthy; don't let it affect you.
Greg43 (Pittsburgh): As a non analytics guy, I keep reading on Starling Marte and how he consistently miffs BABIP experts. After he cut down on his k's and increased his walks, he had one of the highest 2nd half averages last year. What do you make of him moving forward?
Rob Arthur: He's great. Like, really, really, good. I picked him as a breakout candidate early last year and he did not disappoint. Eventually, his BABIP will fall back down to earth, but it's not unprecedented for a player to have a multi-year run of good BABIPs like he's having. Moving forward, I think next year he's relatively safe, but like I said, eventually the BABIP will come down. Hopefully by that point he can make up for declining BABIP with increased power.
NightmareRec0n (Boston): I've thrown this out before but the whole souring on prospects talk makes me remember. I can totally see Kris Bryant pulling a Chris Davis and struggling to make contact this season and everyone souring on him. Then in 2016:He'll go bananas and hit .270 with 35 homers and everyone will be shocked.
Rob Arthur: Very possible. We've been spoiled by Harper/Trout-type trajectories for young players, and forgotten that many must suffer through a miserable down year or two. The track that Anthony Rizzo followed is, historically speaking, I think more common.
Matt (Cambridge): You are trying to determine what makes a good player. Out of 100%, how do you divide these factors:
Rob Arthur: That's a tough one. Totally making these numbers up: 50% physical, 25% neurological, 20% mental, and 5% effort, because I think almost all of them are giving full effort (not that effort is unimportant, it's just not something that varies much between players).
Cris E (St Paul, MN): Talk about the huge increase in shifting and how it affects your data and analysis. Have you thought about identifying guys that are being exposed by it? For example Mauer is getting killed by defensive positioning. And how do the defensive numbers change for infielders being sent all over the diamond? Are those "out of range" plays when a grounder is hit right at a SS playing 2b?
Rob Arthur: It's a great unknown in modern sabermetric research, especially with some of the more subtle shifts that have been happening recently. Because we don't have shift/non-shift data for individual plays easily available, we are mostly left guessing and looking at players who tend to get shifted a lot. Since those players are already exceptional for a variety of reasons, it is very difficult to study. So we are kind of screwed; at least until Statcast arrives and can give us shift status on every play (hopefully, fingers crossed). Personally, in the absence of any ability to study it, I tend to not think about it, which is perhaps not ideal.
Dylan (Canada): Is the lack of depth a big problem for the Jays? I see them as being very similar to the Red Sox and maybe more upside, but without the depth. If Joey Bats or Encarnacion gets hurt, they are going to be trotting out some bad players.
Rob Arthur: Yes, it definitely is a big problem. I think it makes their projections more variable, and also reduces the most likely outcome. Almost all teams lose someone of value for a substantial period of time at some point, and so having depth pieces is much more useful than it appears on paper (or in PECOTA's mind). The Jays have some older and more injury prone players, so that goes double for them.
Cris E (St Paul, MN): So what's the status on that cool fieldingFX thingy they were starting to flash at the all-star game last summer? It seemed very "proof of concept" at the time. Does it have an ETA? What's the funnest looking aspect of it from your perspective?
Rob Arthur: Statcast, it is called. It was a proof of concept then, it is now ready and will be deployed in all ballparks for the coming season. They haven't yet said when we will get our hands on the data; I'm hoping sometime this year (but I don't have any idea).
The most exciting thing for me isn't the fielding, actually. I'm really curious what this can tell us about other aspects of player performance. For example: Does a player's reaction time when they field have anything to do with their batting ability? Can we tell when a player is less than 100% by slightly reduced speed in the field? This sort of question intrigues me, partially because I suspect the fielding results will be anticlimatic.
Reader (BP): Is the business side of baseball a relatively unexplored frontier by the public? I remember reading that dynamic ticket pricing was just as revolutionary for baseball as sabermetrics. We still have no idea how much value a star provides off the field or long term effects of winning/losing. For example: If the Mets spent more, just how much market share could they take from the Yankees if they were good and the Yankees were bad? Why is the Dodger TV deal worth more than the team which controls their TV rights?
Rob Arthur: Definitely, it is unexplored (and interesting). The problem is that it's also incredibly difficult to get data on; teams are reticent to share their in-depth knowledge, and without knowing the various kinds of revenue, we are generally in the dark. I'm not sure how we ever get data on this side of things, either; maybe we just have to wait and some generous team will open up their books, but I doubt it.
Charles (NYC): Marte's BABIP has been brought up, so I feel I have to bring up Puig. He has posted a .383 and .356 BABIP the past two season, but doesn't seem like a high BABIP guy. He doesn't hit a lot of line drives or never pop-out. He is fast, but is not a blazer. He still pulls the ball a fair deal as well. Do you see him regressing in the BABIP department?
Rob Arthur: Regressing, yes. Falling back to average, no. He's very strong, and I could be argued into believing that he just makes exceptionally high-quality contact with the ball. Beyond that, I'm really not sure. I would recommend this article: http://www.fangraphs.com/blogs/examining-yasiel-puigs-babip/
John (CT): PECOTA is always a negative for Buck and the Birds this time of year.
Where do you see Baltimore finishing this year?
Rob Arthur: Yeah, and I still haven't figured out why it's wrong. There's no attractive explanation for it, and so in the absence of that explanation, I am compelled to go with PECOTA, and put them near the bottom of the division. 4th place, maybe with a few more wins added to account for Orioles magic (82 wins, maybe?).
Cris E (St Paul, MN): Danny Santana in MN is another that looked great due to BABIP. He's a waterbug. Does that type of player have a better or worse chance of full regression?
Rob Arthur: He's going to regress hard. A thing about some of the high BABIP guys is that they have low strike probability, indicative of the pitcher being afraid of them. Santana doesn't have that; he has one of the highest average strike probabilities in the league. He's a mirage, I think.
justarobert (Santa Clara): The storyline for hitters who suddenly gain a lot of game power is often something mechanical, and we know (?) that a reported change in swing mechanics is often a portent of, well, nothing. Can we learn anything from the toolsy prospects like Reyes and Kemp who suddenly learn plate discipline?
Rob Arthur: "we know (?) that a reported change in swing mechanics is often a portent of, well, nothing."
do we know that? I'd love to get a better test of that hypothesis. I don't know how. I have some pipe dreams, long term, of automated video analysis of hitter mechanics, but that is really difficult.
To your other question: I think, but have not tested, that gaining plate discipline is a great sign for young hitters. It means that they are adaptable, which is important because in my experience, pitchers are always, constantly, and perpetually changing the approach with which they attack hitters. Good hitters change subtly to respond, and that's how they stay good, as opposed to having one flash in the pan type season (or half season, in some cases).
Data Whore (Looking at Pitch F/X data): Given the NHL's recent support on advanced statistic. Do you think MLBAM will provide an official outlet for Statcast data or leave it up to the more establish baseball community to deal with the data. How long before MLB officially endorses things like WAR or even wOBA to a lesser extent?
Rob Arthur: I believe that they will provide processed Statcast data--not the raw feed, which is said to take up terabytes. They have been incredible at providing pitchf/x for the community, and I hope that they will take that as a model for how to deliver on Statcast. As I tweeted at them, I hope that they will get us as close to the raw data as is feasible.
I don't think they will ever endorse one stat or the other, and that's a good thing. All statistics are flawed in some way or another, even WAR, so I don't want one to become the "official" metric of MLB.
Matt (Cambridge): I agree. I think Statcast is going to be HUGE in predicting injury.
Rob Arthur: Indeed, I am really looking forward to this. We have a default assumption that everyone is healthy until they go on the disabled list, and I think that this might be one of our precepts that is destroyed by Statcast. I wonder if there aren't constant local performance fluctuations as players suffer bruises, sprains, strains, etc., which slightly impair their overall performance, but not enough to be reported to the trainer.
Rob Arthur: Alright, guys, it's been a blast. I'm going to wrap it up now. You asked some great, insightful questions. I will be looking forward to the next chat.