CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here to subscribe
<< Previous Article
Premium Article Pebble Hunting: Matt H... (07/10)
<< Previous Column
Premium Article Manufactured Runs: Rub... (07/05)
Next Column >>
Manufactured Runs: The... (08/01)
Next Article >>
Painting the Black: Er... (07/10)

July 10, 2013

Manufactured Runs

The Mystery of the Missing .500 Teams, Part Two

by Colin Wyers

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Subscribe for $4.95 per month
Recurring subscription - cancel anytime.


a 33% savings over the monthly price!

Purchase a $39.95 gift subscription
a 33% savings over the monthly price!

Already a subscriber? Click here and use the blue login bar to log in.

A couple of weeks ago, I wrote about the distribution of team wins, and the discovery that the distribution may in fact be bimodal, not normal as one might expect.

One of the predictions that came from this theory was that teams right at .500 would, counterintuitively, tend to regress away from the mean. So one thing we can do is actually check to see if the real world behaves the way we expect it to. I took all teams from 1969 on with even numbers of games and split them into “halves” of even-number games. I use scare-quotes for halves since in order to boost the sample size, I split into increments of two and kept any pair where both “halves” were within 20 games of each other. Then I looked at teams that were exactly .500 in the “before” sample— 716 teams total—and saw what they did afterward:

Again, we see a pronounced bimodal pattern in the data. What’s interesting is that we don’t even see it coming out to .500 in the aggregate; the average is .497, close enough to .500 that we could chalk it up to a sampling issue, but the median is .489. While 323 teams have records greater than .500, 359 teams have records under .500 (with 34 teams exactly at that mark). Looking at the most common records (after prorating to wins in a 162-game schedule, to control for the uneven number of games in the “after” samples):

Win%

Num

0.525

55

0.475

46

0.488

44

0.512

42

0.500

34

0.444

24

0.537

23

0.568

21

0.451

21

0.463

20

0.420

20

Which looks rather similar to the chart in the last article.

So what we have is a weird little case of teams actually fleeing from the mean, rather than regressing toward it. We have a theory as to why this might be, if you’ll recall what we said last time. There’s very little glory in finishing right at .500. It’s hard to make the playoffs at that record, and if you do you’re at a disadvantage compared to the other playoff teams in both seeding and talent. The incentives are lined up for teams to either finish above .500 and contend, or retool and finish below .500.

So I took my split-halves sample and looked at all teams from 1985 on (1985 being the first year I have salary data for all 30 teams), not just teams at .500 in the “before” sample. And I looked at five variables:

  • A team’s actual win percentage in the before sample,
  • A team’s third-order win percentage in the before sample (not the whole season),
  • How many games back of the division leader a team was as of the last day of the before sample,
  • A team’s TV market size, as defined by this Nate Silver study, and
  • A team’s payroll for that season, divided by the average team’s payroll that year (which I termed the “payroll index”).

And I looked at how well they predict rest-of-season win percentage, using an ordinary least squares regression:

Coefficient

Standard err.

p-value

Constant

0.2242

0.0121069

1.08E-74

Win Percent

0.127259

0.0295882

1.72E-05

Third-Order WPCT

0.385595

0.0245993

2.09E-54

Games Back

-0.000741

0.00025036

0.0031

TV Market

9.39E-10

2.78E-10

0.0007

Salary Index

0.0159546

0.00331814

1.56E-06

“Constant,” also known as the intercept, is the predicted value if all the input values are zero. P-value is a test of statistical significance; the common rule of thumb (bear in mind that’s all it is, though) is that values above .05 are not significant. All of our values are statistically significant. This could be the result of overfitting, so we can check three model selection criterion—the Bayesian information criterion, the Akaike information criterion, and the Hannan–Quinn information criterion. None of them improves when the games back, TV market size and salary index terms are omitted. That suggests that although the differences between a regression equation omitting them and one including them are small (the adjusted R-squared goes from 0.27 to 0.28, and the standard error goes from .0722 to .0717), it’s not a product of overfitting.

What’s the practical use of all of this? One standard deviation’s change of games back results in a change of .005 in predicted rest-of-season win percentage, one SD change of TV market size means a change of .004, and one SD change of salary index is worth .006. Because TV market size and salary are substantially correlated at .61, the observed results are likely to be somewhat more pronounced than this suggests.

What this seems to tell us is that there is a small but real targeting effect in rest-of-season wins, akin to the notion of a “self-aware coin.” Teams that are closer to the division leaders are going to perform better in the after sample than teams further behind, given the same expected performance otherwise. Moreover, “large-market teams” (broadly speaking) are going to do better than small-market teams, all else being equal. Here, there’s a bit of a mystery as to what the root cause could be—it could be that such teams have more resources to invest in improving the team mid-season, it could be that they have greater financial incentives to do so, it could be that they are protecting a larger previous investment in the club by doubling down. Or it could be some combination of some or all of these causes. (It could even be that a high-payroll .500 team is more likely to be underperforming their true potential, while a low-payroll .500 team is overperforming theirs.)

In terms of what we do around here, this means that our playoff odds report might be underestimating the rest-of-season performance of teams close to the division leader (although it suggests that we’re underestimating the rest-of-season performance of division leaders as well, so it’s possible that the net result is no significant change in playoff odds probability). It also means that our assessments, both now and in the preseason, are slightly underrating large-market clubs and overrating small market clubs. (The Dodgers’ recent acquisition of Ricky Nolasco and his salary from the Marlins in exchange for some magic beans is an illustration of the sort of thing our naïve model is likely not capturing.) It’s something we’ll look at including in our simulations in the future.

In the larger picture, this is a reminder that MLB teams are not simply random number generators, nor are the players on them. They’re run by and composed of real people who respond to incentives, and they can change what they’re doing in response to results. This doesn’t invalidate the use of tools that treat teams and players as random-number generators, mind you—they can and often do produce useful results. But it does suggest that there are other approaches to analyzing baseball that can produce new and surprising conclusions, ones that can deepen our understanding of the game and the people playing it.

Colin Wyers is an author of Baseball Prospectus. 
Click here to see Colin's other articles. You can contact Colin by clicking here

Related Content:  Projections,  Playoff Odds

6 comments have been left for this article.

<< Previous Article
Premium Article Pebble Hunting: Matt H... (07/10)
<< Previous Column
Premium Article Manufactured Runs: Rub... (07/05)
Next Column >>
Manufactured Runs: The... (08/01)
Next Article >>
Painting the Black: Er... (07/10)

RECENTLY AT BASEBALL PROSPECTUS
Playoff Prospectus: Come Undone
BP En Espanol: Previa de la NLCS: Cubs vs. D...
Playoff Prospectus: How Did This Team Get Ma...
Playoff Prospectus: Too Slow, Too Late
Premium Article Playoff Prospectus: PECOTA Odds and ALCS Gam...
Premium Article Playoff Prospectus: PECOTA Odds and NLCS Gam...
Playoff Prospectus: NLCS Preview: Cubs vs. D...

MORE FROM JULY 10, 2013
Premium Article Pebble Hunting: Matt Harvey, Ross Detwiler, ...
Premium Article What You Need to Know: Chicago Firepower
Premium Article Minor League Update: Games of Tuesday, July ...
Premium Article The Prospectus Hit List: Wednesday, July 10
Fantasy Article Sporer Report: Home-Cooking Starters
Premium Article Daily Roundup: Around the League: July 10, 2...
Fantasy Article Fantasy Freestyle: The Resurgence of John Da...

MORE BY COLIN WYERS
2013-08-02 - Manufactured Runs: SABR Recap: On Motion Tra...
2013-08-01 - Manufactured Runs: The Phillies President Sp...
2013-07-25 - Feature Focus: Daily Hit List
2013-07-10 - Premium Article Manufactured Runs: The Mystery of the Missin...
2013-07-05 - Premium Article Manufactured Runs: Ruben Amaro and the Ryan ...
2013-06-28 - Premium Article Manufactured Runs: The Mystery of the Missin...
2013-06-27 - Feature Focus: Player Cards
More...

MORE MANUFACTURED RUNS
2013-08-05 - Premium Article Manufactured Runs: Biogenesis and Baseball's...
2013-08-02 - Manufactured Runs: SABR Recap: On Motion Tra...
2013-08-01 - Manufactured Runs: The Phillies President Sp...
2013-07-10 - Premium Article Manufactured Runs: The Mystery of the Missin...
2013-07-05 - Premium Article Manufactured Runs: Ruben Amaro and the Ryan ...
2013-06-28 - Premium Article Manufactured Runs: The Mystery of the Missin...
2013-05-14 - Manufactured Runs: Listen to What the Heyman...
More...