Baseball Prospectus |

More >>

Archives

Lies, Damned Lies column archives.

Printer-
friendly

indicates Baseball Prospectus Premium content, and

indicates Baseball Prospectus Fantasy content.

You can also view archives

or browse research articles in the Baseball Prospectus Library

September 3, 2003

Lies, Damned Lies: Moneymaker (or, Everything I Need to Know about Baseball I Learned From Watching the World Series of Poker)

by Nate Silver

As I sat in the upper deck at Jacobs Field last Saturday, taking in the Indians-Blue Jays tilt and shivering in the Lake Erie breeze with our Cleveland Pizza Feeders, the conversation turned to Texas Hold 'Em. Poker is a natural fit for baseball fans, especially the sort that are likely to attend our events. Like baseball (or at least the 'game' of baseball management), poker is a game that's grounded in mathematics, and in optimizing the use of limited information. Like baseball, it's also a lot of fun--at least when you're winning. Just a bit of background here, which will be unnecessary for some readers and inadequate for others (if you've never played poker at all, this probably isn't your column). Texas Hold 'Em is a variant of poker in which each player is dealt two 'hole' cards face down, and makes the best five-card hand he can between his own cards and five common cards that are dealt to the entire table. The 'face down' part is the key: a player's hole cards are never revealed until the last round of betting has been completed. In fact, in a tight game, the hands are often not revealed at all--every player but one will have folded before the showdown occurs. I've always found that last bit fascinating: players are willing to risk (sometimes large) sums of money on hands that they're never able to see. While a good player can pick up plenty of information between observing the table's betting patterns, running and rerunning the odds of particular hands occurring, and observing the other players' "tells," there's always the lingering possibility of a bluff, which as a game theorist can tell you, will occur just often enough to keep a bettor on his toes. Lest you think this is a Bill Simmons-style off topic diversion, there are lessons that can be drawn from Hold 'Em and applied to baseball. Let's take a break from the usual dose of number crunching and look at those this week.

August 27, 2003

Lies, Damned Lies: The Value of Speed

by Nate Silver

As you are all unfortunately aware, Bobby Bonds died this past Saturday after a long battle with cancer. Bobby came before my time, and I'm not fit to eulogize him. But perhaps I can honor his memory in some way by looking at players of the sort that Bobby exemplified: power-speed sluggers. A lot of analysts are fond of disparaging the value of speed (this Web site has been no exception). Speed is perceived as a scouty thing, a tool that looks impressive, but has little practical value on a baseball diamond. The one definitive advantage that speed would seem to provide--the stolen base--is rightly considered an overrated tool. Even within mainstream circles, speed seems to be losing currency. As ballplayers bulk up, and deeper lineups grow ever more capable of scoring runs with the bat alone, stolen base attempts become less frequent. Entire teams are willing to put together their rosters without so much as giving speed the once-over. Well, I think speed has gotten a raw deal. Certainly, speed isn't as important for a position player as the Big Three skills--hitting for contact, hitting for power, and controlling the strike zone--and to list it alongside those three, implying that it is of equal significance, is confusing. But speed is still plenty important for a number of reasons...

August 19, 2003

Lies, Damned Lies: Streakin'

by Nate Silver

Having played the first half of his career before the Second World War, Joe DiMaggio is not eligible to be on Albert Pujols' PECOTA comparables list. However, there's little doubt that the Yankee Clipper would place high atop the table if he had been born just 10 years later. The similarity scores at baseball-reference.com listed the pair as the best age-based likenesses for one another entering the season, and the events of this year are only likely to enhance the comparison. DiMaggio won his first batting title and his first MVP award in 1939--at age 24, he was one year older than Pujols is currently listed. DiMaggio, unlike Pujols, had been heralded as a top prospect from the time he was a teenager playing in the PCL, and was coming off of a fine triplet of seasons in the big leagues. But 1939 was his coming out party, much like 2003 has been for Pujols. Conveniently enough, DiMaggio, limited by a foot injury that he suffered in April, played in just 120 games that season, almost exactly the total that Pujols has accumulated up until now. Compare DiMaggio's '39 against Pujols' current campaign, and the similarities are striking.

August 13, 2003

Lies, Damned Lies: A Roll of the Dice

by Nate Silver

The Red Sox ended Tuesday night four games behind the Yankees in the AL East. What are the odds that they can make up that deficit to take the division? And, failing that, what are their chances to edge out the A's for the wild card? Seriously. Grab a pencil and a piece of paper, come up with your best guesstimate, and write it down. Harder than you thought, huh? Keep reading, and we'll have an answer for you in a bit. Whether they realize it or not, major league teams are making calculations like this all the time. Implicitly or explicitly, they can determine the direction that a team chooses to take: whether to move prospects for veterans at the trade deadline, whether to shut a young pitcher down for the season, or try (injury risk be damned) to get as much work out of him as they can. Wins are the currency that baseball transacts in, but for many purposes, they're only as good as the pennants and postseason appearances that they can be redeemed for. Much as some pundits like to talk about Mystique, Aura, and Veteran Leadership, the postseason is a lottery of sorts. Winning 11 playoff games is often a lot easier than winning 90 or 95 in the regular season, and many teams consider their season a success if their postseason ticket is punched, and they get to take their chance in the playoffs.

August 6, 2003

Lies, Damned Lies: Quantum Leap

by Nate Silver

Up until this season, my clearest memory of Jose Guillen is as the object of some very unflattering jeering in the right field bleachers at Wrigley Field. The bleacher bums are never kind to opposing outfielders, but Guillen, being young, bad, and foreign, was a particularly vulnerable target. Guillen reacted to the taunts by alternately appearing hopelessly dejected and demonstratively angry, only making matters worse. Though he got his revenge that day--hitting a home run off crowd-favorite/headcase Turk Wendell--I've always had trouble watching him play without the phrase Jo-se-do-you-suck! running warbled, drunken, Francis Scott Off-Key through my head. However cruel, the taunting had proved prescient. Back in 1997, Guillen had time and an abundance of raw talent on his side. Bouncing between four organizations and failing to demonstrate any development, Guillen had regressed to the level of benchwarmer; his career .239 EqA entering the season was below replacement level for a corner outfielder. If not for his powerful right arm (an impressive tool, but overrated in its importance) and his much-tarnished Topps All-Rookie Team trophy, Guillen might have been riding shuttles between Louisville and Chattanooga or selling real estate instead of holding down a fourth outfielder job in the bigs. This season, of course, Guillen has had the last laugh. Easily the most productive hitter on the Reds this year, Guillen filled in admirably for Ken Griffey Jr. Now traded to the A's, he's been charged with the Herculean task of trying to make up for an entire outfield's worth of mediocrity, salvaging Billy Beane's reputation as a deadline dealer nonpareil in the process. But what if Guillen turns back into a pumpkin?

July 30, 2003

Lies, Damned Lies: Leading Off

by Nate Silver

One of the perks of traveling for work--I've been doing a lot of that lately--is the USA Today planted in front of your hotel room door. Sure, for the most part, McPaper's articles are about as substantive as the "continental breakfast" you're likely to eat while reading it--but now and then, in its own glossy, Technicolor way, USA Today stumbles across something significant. Last Wednesday's sports page featured a headline on leadoff hitters--it seems that there aren't very many good ones these days. As the article pointed out, none of the league's leadoff hitters are among the top 30 players in OBP. Among qualified players, the highest-ranking leadoff hitter is Ichiro Suzuki, 39th as of this writing (Jason Kendall, who has occupied the leadoff spot in Pittsburgh since the departure of Kenny Lofton, ranks 31st). And it's not as if Suzuki or Kendall are walking machines in the mold of Rickey Henderson--Ichiro is a fine player who can hit .340 consistently, but his walk rate is well below league average, while Kendall's OBP is boosted in part by his fearless desire to lean into pitches. Then again, players of the Rickey/Tim Raines profile have never been terribly common. It also doesn't help when teams insist on placing mediocrities like Eric Young or Endy Chavez in the one-hole. Is anything going on here, apart from a one-year fluke?

July 24, 2003

Lies, Damned Lies: Hitting the Wall

by Nate Silver

OK, so it might not have been the most controversial thing he's said this month--even our intrepid Derek Zumsteg didn't dare sweat out this Dusty Baker gem. But the Cubbie manager also made the claim that older players fare better in the second half. Dusty's claim has at least some grounding in his own experience--under his management, the veteran-laden Giants were markedly better in the second half in both 2002 and 2000, and marginally better in 2001. (Over the course of his entire tenure, the record is far more ambiguous: in Dusty's 10 seasons at the helm, the Giants played .535 ball before the first of July, and .546 after it). While the Cubs' second half didn't get off to a great start with the injuries to Corey Patterson and Mark Prior, it'd sure be nice to see them still in the race come September. The acquisitions of Aramis Ramirez and Kenny Lofton have the Wrigley faithful in a frenzy; will Baker prove to be a sage or a charlatan? Not to ruin the fun or anything, but this is a testable claim. By comparing the first and second half performances of players of various ages, we can see which ones really perform best down the stretch.

July 16, 2003

Lies, Damned Lies: PECOTA Mid-Season Review

by Nate Silver

Watch SportsCenter this time of year, or read the Sunday baseball page--that's the one with the long list of players sorted by their batting averages--and you're sure to see plenty of stories about what a wonderful, surprising baseball season this has been. Why, who would have thought that Dontrelle Willis would have been drawing Mark Fidrych comparisons, that the Royals would be 10 games over sea level at the Break, that Melvin Mora would be an MVP candidate, that Esteban Loaiza would be the best arm in the American League? Perhaps there's some Joe Namath among you, some Nostradamus, some Miss Cleo, but we certainly didn't.

July 9, 2003

Lies, Damned Lies: Digging in the Backyard

by Nate Silver

Nate Silver plays cartographer in this edition of Lies, Damned Lies, in search of untapped sources of amateur talent in the U.S.

July 2, 2003

Lies, Damned Lies: A Whole Different Ballgame

by Nate Silver

Through Sunday night's game in Anaheim, the Dodgers had scored an average of 3.46 runs per game, the lowest total in the league. Thing is, they're allowing even fewer runs--only 3.03 per game. It's an odd formula, as if concocted from the lovechild of Whitey Herzog and Hal Lanier, but for the most part, it's been working. Has the Dodgers performance thus far been historically significant? You bet your Lasorda. Since the end of the deadball era, no team has turned in a performance so out of line with the rest of the league. In the table below, I've listed those teams since 1920 whose runs scored plus runs allowed represented the lowest percentage of league average...

June 25, 2003

Lies, Damned Lies: Redefining Replacement Level

by Nate Silver

Nate Silver takes a closer look at replacement level in search of a better, zestier approach.

June 18, 2003

Lies, Damned Lies: Bounces

by Nate Silver

Baseball is full of bounces, and not just the path of a Jacque Jones double as it skips across the Metrodome turf (or a Carlos Martinez homer as it skips off Jose Canseco's head). Rather, teams can expect a bounce in attendance when they move into a new facility, facilitating a higher payroll, a more competitive club, and ultimately, it is hoped, a couple of pennants to hang on the outfield wall. Or at least, once upon a time, they could have. The standing-room-only precedent established in places like Toronto and Baltimore and Cleveland no longer seems to hold. Attendance in Detroit, Milwaukee, and Pittsburgh has already regressed to the levels those teams had grown accustomed to prior to the opening of their new stadiums. Attendance in Cincinnati is up, but only barely--and this with reasonable ticket prices and a fun team on the field. Nobody expects the honeymoon to last forever, but the reinvigorated relationships between ballpark and city that the new stadiums were supposed to engender have lasted shorter than a Liz Taylor nuptial. Since the debut of SkyDome in 1989, 13 of the 26 teams in existence at that time have opened new parks. Two more will open new facilities next year. It has been the longest sustained period of new stadium construction in baseball history. Call them mallparks or, as I prefer, Retroplexes. Either way, there's plenty of evidence that the ball isn't bouncing quite as highly these days.

June 11, 2003

Lies, Damned Lies: Batter vs. Pitcher Matchups

by Nate Silver

Don't tell anyone, but I really enjoy watching Randall Simon hit. The loose, goofy motion in his stance as the ball approaches the plate; the flyswatter swing; the big-stepping follow through, his blubber, after half a second in gelatin-like suspension, mimicking the motion of his bat. It's a lot of fun to watch, especially when Simon manages to make contact, which happens more often than you'd ever expect. I've had the occasion, however, to watch Simon against Kerry Wood a couple of times this year, and from Randall's point of view, the results have been disastrous: zero-for-six with four strikeouts. Not just any kind of strikeouts, mind you, but ugly, pirouetting, breeze-generating, no-chance-in-hell strikeouts, the sort that make you think that Simon could face Wood 500 times and go oh-fer. I didn't mind this, really; Wood is one of my favorite pitchers. But this particular matchup was interesting to watch because Simon and Wood are such an odd couple: Simon swings at everything, and never draws any walks, but by virtue of his superior hand-eye coordination, manages to keep his strikeout rate very low. Wood, on the other hand, is one of the toughest pitchers in the league to make contact against--though sometimes that's because he isn't throwing the ball anywhere near the strike zone. In any event, Simon's performance against Wood looked so bad than I began to wonder whether the batter isn't at some sort of systematic disadvantage in pairings of these types of players. To study the question, I'll leverage from a technique that Gary Huckabay and I introduced last month in a 6-4-3 column, comparing the actual performance observed when certain types of batter-pitcher pairings occur against the results predicted by Bill James' log5 formula. Instead of dividing players up based on groundball and flyball rates, this time we'll look at a quick-and-dirty index of plate discipline.

June 5, 2003

Lies, Damned Lies: Solving a Ninth Inning Quandary

by Nate Silver

Statheads...often lament the intentional walk with an argument that usually goes like this: With a runner on third and one out, the expected runs scored for the inning are X. With the bases loaded and one out, that number is Y (higher than X). This argument normally makes sense, but in a situation where one run is all that matters, the manager should instead try to maximize the probability that no runs will score...Does walking the bases loaded with one out make sense on this basis? As D.H. points out, the only thing that each manager need concern himself with is whether that one essential run scores. All the strategic elements of the game--hitting, baserunning, pitching, defense--are profoundly different under these conditions. What's a manager to do?

May 28, 2003

Lies, Damned Lies: Pitcher vs. Batter Matchups (Holes Part Deux)

by Nate Silver

In last week's Lies, Damned Lies, I reviewed Adam Dunn's major league career one plate appearance at a time, in order to determine how his performance changed when facing the same pitcher multiple times. For those of you who, like me, did some damage to your short-term memory over the long weekend, the idea was to discover whether, per Michael Lewis' discussion in Moneyball, Dunn is a hitter with a hole in his swing that gets continually more exploited in repeated trials. In Dunn's case, the answer was a tentative "no", but a lot of people mailed me to ask that I broaden the scope of the analysis. As D.H. writes: "I like your research, but my problem is that you've presented no baseline. It reminded me of a STATS Baseball Scoreboard article on whether Greg Maddux did better the more times he faced a particular batter because he's so "smart." The data showed that the hitters improved as time went on. But, like in your study, there was no baseline to compare against. Adam Dunn may show a drop-off the more he faces a particular pitcher, but maybe all players exhibit identical drops. Or, maybe all players exhibit more precipitous drops, and only the good ones (like Dunn) stick around because they only lose 20% of their value." In other words, is there any systematic advantage to the pitcher or the hitter given repeated trials? Doesn't seem likely, I wrote back, not if the league is going to remain at some kind of equilibrium for very long. But D.H. is correct that it's a question that deserves further study, much like why on Earth I didn't wear sunscreen to the ballgame on Sunday. As I mentioned in the Dunn piece, there is publicly available play-by-play data for each season from 2000-2002. In order to make sure that the players we're working with formed a closed system, I limited the analysis to players who made their major league debuts in 2000 or later. It was then possible to look at all possible 'pairings' of the batters and pitchers within this group--what happens when Billy Batter faces Pete Pitcher for the first time? For the fifth time? For the 20th time, after Bill Batter has dropped the -y from his name and grown a mustache, and Pete Pitcher is discovered to be three years older than listed and actually named Pedro Pichardo?

May 21, 2003

Lies, Damned Lies: Holes

by Nate Silver

Holes isn't just the movie you see begrudgingly upon discovering that The Matrix Reloaded is sold out on all 17 screens at the Springfield GooglePlex. No, "holes" are also one of the big concepts in Michael Lewis' Moneyball, and not just as a part of Billy Beane's vernacular. Rather, Lewis contends that every hitter (excepting Scott Hatteberg, Pickin' Machine) has a hole in his swing, and that the hole will inevitably be discovered and exploited in repeated trials. Unless the hitter is able to make adaptations of his own--retooling his swing, standing in a different place in the batter's box, taking more pitches--the hitter will not be able to survive in the big leagues for long, and will join Kevin Maas and Joe Charboneau in baseball purgatory. It's a nice concept. Game theory hasn't been this sexy since Russell Crowe played the genius/lunatic somewhat resembling Princeton scholar John Nash in A Beautiful Mind. But is it real? Can it be tested? Does it hold its sabermetric water? Let's use Reds slugger Adam Dunn as a test case.

May 14, 2003

Lies, Damned Lies: Randomness: Catch the Fever!

by Nate Silver

Your favorite player hit .360 last season. If you know nothing else, what can you expect him to hit this season? This isn't meant to be a trick question; let's assume the guy had at least 500 at bats in the previous season. Gates Brown and Shane Spencer need not apply. What's your best guess? .350? .340? Not likely. The evidence is overwhelming. Let's look at all hitters since WWII who hit .350 or better in at least 500 at bats; the only other requirement is that they had at least 250 at bats in the year following.

May 7, 2003

Lies, Damned Lies: Binomial Distribution (or What the Heck is Up with Miguel Tejada and Alex Gonzalez?)

by Nate Silver

The baseball season has reached its adolescence. Oh sure, there are the still the occasional temper tantrums, the delusions of grandeur, the fashion faux pas. But the season has been around for long enough that we can't totally dismiss it, even when it mouths off without reason or, convinced of its own invincibility, it pushes its limits a bit too far. The PECOTA system wasn't originally designed to update its forecasts in real time, but through some creative mathematics we can adapt it to that purpose. In particular, we can evaluate its projections by means of a something called a binomial distribution (geek alert: if you're uninterested in the math here, the proper sequence of keystrokes is Alt+E+F+"Blalock"). The binomial distribution is a way to test the probability that a particular outcome will result in a particular number of trials when we know the underlying probability of an event. For example, the probability of a "true" .300 hitter getting six or more hits in a sequence of 15 at bats is around 27.8 percent. (The binomial distribution's cousin, the Poisson distribution, has a cooler name but is less mathematically robust). A couple of important objections are going to be raised here. First, the binomial distribution is designed to test outcomes in cases in which there are mutually exclusive definitions of success and failure--for example, "hit" and "out," or "Emmy Nomination" and "WB Network." The measures of offensive performance that we tend to favor don't readily meet that criterion. Second, the binomial distribution assumes that we know the intrinsic probability of an event occurring, as we would with a dice roll or coin flip. But we never really know what a baseball player's underlying ability is--we're left to make a best guess based on his results, presumably coming closer to the mark as the sample size increases. The first problem has an intriguing, if mathematically sketchy solution in the form of Equivalent Average, which is scaled to take on roughly the same distribution as batting average, even though it accounts for all major components of offensive performance. So, we could test the probability of a "true" .300 EqA hitter putting up an EqA of .400 in 15 plate appearances by assuming that this is equivalent to six successes (40%) in 15 trials. Since I haven't heard any objections, let's roll with it.

April 30, 2003

Lies, Damned Lies: Ticket Prices vs. Player Salaries

by Nate Silver

You've been hanging 'round these parts long enough. You've heard the party line, once or twice or 20 times: Higher payrolls don't result in higher ticket prices. Correlation is not causation. Salaries don't shift demand curves. It's Economics 101. Simple, textbook stuff. The problem with this line of argument--the problem with a lot of economically-based arguments--is that it's easy to let the theory get ahead of the data. Well, I should state that more precisely: It's easy to let an oversimplified theory get ahead of the data. A lot of what you learn in Economics 102, and Economics 201, and graduate-level classes that I was too busy drinking Boone's Farm to take advantage of, is that much of the theory you master in an intro-level class is based on a particular set of assumptions that can prove to be quite robust in certain cases, and utterly misleading in others. A lot of people shun economics for this very reason--we've all had coffee shop conversations with the scruffy, Skynard-mangling philosophy major who is fond of spewing out faux-profundities about the irrationality of human nature. He's missing the point, of course, but so too is the Ayn Rand-spouting prepster from down the residence hall who conflates assumptions with hard rules. In either case, a little bit of knowledge is a dangerous thing. Economics, though it sometimes harbors pretensions to the contrary, is above all else a behavioral science, and an empirical science. If the theory doesn't match the data--well, it's not the data's fault. This is especially important to keep in mind when evaluating something like ticket prices to baseball games, a commodity that is unusual in many ways. As we've stressed frequently, ticket prices ought to be almost wholly determined by demand-side behavior--the marginal cost of allowing another butt in the seats is negligible. But baseball tickets are unusual in other ways, too: They're very much a luxury good, and their prices are determined by a finite number of decision-makers who may be subject to conflicts of interest. It's certainly worth evaluating the available data to see whether we can put our money where our mouth is.

April 23, 2003

Lies, Damned Lies: Estimating Pitch Counts

by Nate Silver

Silicone. Margarine. O'Doul's. Why fool around with watered-down imitations when you've got the real thing ready and available? Rightly or wrongly, a lot of attention has been focused on pitch counts in the past several years. That's partly because of the efforts of people like Rob Neyer, Keith Woolner, and Will Carroll, not to mention those coaches, executives and agents who understand the importance of protecting their golden-armed investments. Pitch counts have become easy to take for granted because pitch count data is more readily available now than it ever was in the past. These days, just about any self-respecting box score lists pitch counts alongside the rest of a pitcher's line, a far cry from the dirty newsprint days of yore, when pitch count references were about as common as mentions of Reality TV or the Information Superhighway. But what about when you don't have pitch count information available? Like, say, you're at a ballgame, and wondering whether Dusty Baker should send Kerry Wood out for another inning? Or you're perusing through minor league stats? Or you're looking at old boxes on Retrosheet, which wonderful as they might be (this, folks, was the first game I ever attended), don't contain any information on pitch counts? Well, it turns out that it's not that difficult to make a reasonable guess at pitch counts based on other information that's much easier to come by. Looking at a complete set of data from the 2001 and 2002 seasons as provided by Keith Woolner, I ran a simple linear regression of pitches thrown against various other characteristics of a pitcher's stat line. Here was the formula that I came up with: