CSS Button No Image Css3Menu.com

Baseball Prospectus home
  
  
Click here to log in Click here to subscribe
No Previous Article
<< Previous Column
Premium Article Lies, Damned Lies: A W... (07/02)
Next Column >>
Premium Article Lies, Damned Lies: PEC... (07/16)
No Next Article

July 9, 2003

Lies, Damned Lies

Digging in the Backyard

by Nate Silver

the archives are now free.

All Baseball Prospectus Premium and Fantasy articles more than a year old are now free as a thank you to the entire Internet for making our work possible.

Not a subscriber? Get exclusive content like this delivered hot to your inbox every weekday. Click here for more information on Baseball Prospectus subscriptions or use the buttons to the right to subscribe and get instant access to the best baseball content on the web.

Subscribe for $4.95 per month
Recurring subscription - cancel anytime.


a 33% savings over the monthly price!

Purchase a $39.95 gift subscription
a 33% savings over the monthly price!

Already a subscriber? Click here and use the blue login bar to log in.

Baseball is the National Game, but at the amateur level, it's also a regional one. The frozen tundra of the Upper Midwest and the rolling hills of the Appalachians do not afford the same opportunity to play the sport year-round as the marshes of Florida, or the sun-drenched ballfields of California.

Major league teams, which collectively are responsible for drafting nearly 1500 players every year--a far bigger burden than their counterparts in other sports face--are keenly aware of the differences. It simply isn't possible, or at least not economically feasible, to develop an accurate scouting report for every amateur prospect in the country. While the top national prospects will be scouted by everyone, teams go regional as the draft moves into its later rounds, focusing on players from their home territories (as the Braves do) or on players from regions in which the level of competition if perceived to be the highest--California, Florida, and the Southwest.

Most of you, I suspect, are aware of these disparities. When a player from a cold-weather state is selected high--like Rocco Baldelli, pride of Woonsocket, Rhode Island, or Joe Mauer, pride of St. Paul, Minnesota--their hometown is mentioned early and often, precisely because such selections are unusual. But it's striking just how profound the differences are. A high school senior from Texas is three times more likely to be drafted into professional ball than a high school senior from New York. A high school senior from Arizona is six times more likely to be drafted than a high school senior from Minnesota. A high school senior from Florida is nine times more likely to be drafted than a high school senior from Illinois.

Let's back up a second. We can come up with a pretty good estimate of the intensity with which a given state is scouted if we know two things: the number of players that are drafted, and the number of players that potentially could be.

The first part is easy--we'll use data from the 2003 amateur draft, focusing on players selected out of high school only, based on their respective home states. College and Juco selections would confuse matters here--institutions of higher learning are not evenly distributed among the population, and the allotment of colleges with good baseball programs is even more skewed. Players flock to where the good programs are, not the other way around.

There isn't, so far as I know, data on just how many high school baseball players there are in each state, but it's easy to come up with a reasonable proxy. The good folks at the Census Bureau have prepared a wealth of data organized by different demographic categories. One of these categories is age; the census estimated the number of 15-19 year olds in each state as of 2000 (you can find the data here, but be warned--the PDF file linked is massive).

If we further assume that: i) people are evenly distributed within this age group--that is, there are as many 16-year-olds as 18-year olds, and ii) half of the people in this group are male (sorry, gals), we can come up with a good idea of the number of potential pro ballplayers there are in a given region. For example, according to the census, there are roughly 540,000 people between the ages of 15 and 19 in North Carolina as of April, 2000. We assume that one-fifth of these people (108,000) were high school seniors this year (insert dropout joke here), and that one-half of these (54,000) are male.

Using these two numbers, we can come up with an estimate of the likelihood that a player from a given state is drafted. In the table below, I've created an index of the number of MLB draftees per 10,000 male 12th graders.


                                    Male HS        Draftees
                 2003 HS            Seniors      per 10,000
State           Draftees        (estimated)       Eligibles

Florida              82            101,407              8.1
Nevada               10             12,720              7.9
Puerto Rico          22             31,344              7.0
Arizona              18             36,772              4.9
Colorado             14             30,724              4.6
Washington           18             42,797              4.2
California           98            245,089              4.0
Alaska                2              5,009              4.0
Hawaii                3              8,100              3.7
Oklahoma              9             26,937              3.3
Louisiana            12             36,595              3.3
Georgia              19             59,628              3.2
Indiana              13             45,348              2.9
Texas                46            163,623              2.8
Alabama               9             32,458              2.8
North Carolina       14             53,993              2.6
Tennessee            10             39,518              2.5
Virginia             11             48,407              2.3
Missouri              9             41,330              2.2
Oregon                5             24,443              2.0
Massachusetts         8             41,574              1.9
North Dakota          1              5,362              1.9
Utah                  4             21,628              1.8
Iowa                  4             22,642              1.8
Maryland              5             35,612              1.4
Montana               1              7,131              1.4
Mississippi           3             23,319              1.3
Wisconsin             5             40,720              1.2
New Jersey            6             52,522              1.1
Arkansas              2             19,877              1.0
Ohio                  8             81,687              1.0
Kansas                2             21,012              1.0
Pennsylvania          8             85,099              0.9
Idaho                 1             11,086              0.9
Illinois              8             89,400              0.9
New York             11            128,754              0.9
Minnesota             3             37,436              0.8
West Virginia         1             12,558              0.8
Kentucky              2             28,900              0.7
New Mexico            1             14,575              0.7
Connecticut           1             21,663              0.5
Michigan              2             71,987              0.3
Washington DC         0              3,787              0.0
Wyoming               0              4,190              0.0
Vermont               0              4,577              0.0
Delaware              0              5,563              0.0
South Dakota          0              6,246              0.0
Rhode Island          0              7,545              0.0
New Hampshire         0              8,669              0.0
Maine                 0              8,949              0.0
Nebraska              0             13,491              0.0
South Carolina        0             29,538              0.0

There's quite a range of values here, from 8.1 (Florida, though Nevada makes a surprisingly strong showing) to 0.0 (several states, the largest of which are Nebraska and South Carolina). The map below reflects the regional pattern that I described earlier:

The Sunbelt is heavily scouted, the Bible Belt less so, and the Rust Belt all but abandoned. Being from the Midwest, I'm sensitive about this. The South might have superior weather, lower cigarette taxes, and prettier women, but does it have better ballplayers, too?

Before we attribute these differences to some sort of "bias," it's best to check up on what sort of return teams can expect to have on players from different regions. It would be dubious to claim that, say, Florida is overscouted if the concentration of good players there is commensurate with the state's heavy load of draft picks.

One metric that should work pretty well is the geographic distribution of major league players. California, for example, is home to about 12% of the high school seniors in the United States; by comparison, it was responsible for about 19% of the high school players selected in the June draft. But that figure looks pretty reasonable when you compare it to the number of major league players who hail from California. Between 1993 and 2002, Californians accumulated approximately 25% of major league plate appearances, and 18% of major league innings pitched (I'm not counting players who hailed from outside the U.S. and Puerto Rico). California is scouted very heavily, but deservedly so.

Florida, however, doesn't hold up so well. It accounted for 16% of this year's high school draftees, but only 8% of the major league PA in our sample, and under 5% of the major league IP. Sure, the state's population has grown a little bit in the past two decades, producing a lag effect that these numbers won't capture quite right, but the discrepancy here is huge; it seems likely that major league teams are getting a lower return on players from Florida than those from other parts of the country.

Other states that come up as overscouted include Arizona, Colorado, and Texas, though the latter is something of an exceptional case. Turns out that there are a ton of pitchers from the Lone Star State--Texans accounted for 8.4% of the major league IP in our sample--but relatively few position players (just 2.9% of the sample). Whether that's due to the predominance of football in the state--if you can't play quarterback, pitch--the historical legacy of Nolan Ryan and Roger Clemens, or something else entirely, I don't know, but the split is dramatic.

The complete set of data is provided below; I've listed a series of percentages indicating the following:

  1. The state's share of high school seniors, as estimated from the census data;

  2. The state's percentage of 2003 high school draftees;

  3. The state's share of MLB plate appearances over 1993-2002;

  4. Same thing for innings pitched;

  5. The average of the PA and IP figures;

A "Delta" figure indicating the difference between MLB playing time (#5) and 2003 draft pick percentage (#2). Negative figures indicate that a state is overscouted, and positive figures that it's underscouted.


State         Est HS       2003   MLB PA   MLB IP
          population   Draftees  (93-02)  (93-02)  Average    Delta
Florida          4.9%     16.0%     8.2%    4.6%      6.4%    -9.6%
Texas            8.0%      9.0%     2.9%    8.4%      5.7%    -3.3%
Arizona          1.8%      3.5%     0.6%    0.8%      0.7%    -2.9%
Colorado         1.5%      2.7%     0.2%    0.7%      0.4%    -2.3%
Washington       2.1%      3.5%     1.6%    1.6%      1.6%    -2.0%
Nevada           0.6%      2.0%     0.3%    0.4%      0.3%    -1.6%
Tennessee        1.9%      2.0%     0.6%    1.0%      0.8%    -1.2%
North Carolina   2.6%      2.7%     1.8%    1.9%      1.8%    -0.9%
Utah             1.1%      0.8%     0.0%    0.2%      0.1%    -0.6%
Indiana          2.2%      2.5%     1.9%    1.9%      1.9%    -0.6%
Oklahoma         1.3%      1.8%     0.9%    1.5%      1.2%    -0.6%
Georgia          2.9%      3.7%     3.5%    3.1%      3.3%    -0.4%
Virginia         2.4%      2.2%     1.2%    2.3%      1.7%    -0.4%
Hawaii           0.4%      0.6%     0.1%    0.6%      0.3%    -0.2%
Alabama          1.6%      1.8%     1.4%    1.6%      1.5%    -0.2%
Idaho            0.5%      0.2%     0.0%    0.0%      0.0%    -0.2%
Montana          0.3%      0.2%     0.0%    0.1%      0.0%    -0.2%
New Mexico       0.7%      0.2%     0.0%    0.1%      0.1%    -0.1%
Arkansas         1.0%      0.4%     0.3%    0.3%      0.3%    -0.1%
Alaska           0.2%      0.4%     0.0%    0.7%      0.4%    -0.0%
Missouri         2.0%      1.8%     1.0%    2.5%      1.8%    -0.0%
Vermont          0.2%      0.0%     0.0%    0.0%      0.0%    +0.0%
Oregon           1.2%      1.0%     1.4%    0.7%      1.1%    +0.1%
Washington DC    0.2%      0.0%     0.1%    0.1%      0.1%    +0.1%
West Virginia    0.6%      0.2%     0.2%    0.4%      0.3%    +0.1%
South Dakota     0.3%      0.0%     0.0%    0.2%      0.1%    +0.1%
Maine            0.4%      0.0%     0.0%    0.2%      0.1%    +0.1%
Iowa             1.1%      0.8%     0.1%    1.7%      0.9%    +0.1%
North Dakota     0.3%      0.2%     0.3%    0.4%      0.3%    +0.2%
Rhode Island     0.4%      0.0%     0.2%    0.1%      0.2%    +0.2%
Wyoming          0.2%      0.0%     0.4%    0.0%      0.2%    +0.2%
Minnesota        1.8%      0.6%     0.9%    1.0%      0.9%    +0.3%
Wisconsin        2.0%      1.0%     1.0%    1.6%      1.3%    +0.3%
Nebraska         0.7%      0.0%     0.4%    0.3%      0.3%    +0.3%
New Hampshire    0.4%      0.0%     0.1%    0.6%      0.4%    +0.4%
Delaware         0.3%      0.0%     0.7%    0.0%      0.4%    +0.4%
Kansas           1.0%      0.4%     1.3%    0.5%      0.9%    +0.6%
Mississippi      1.1%      0.6%     1.6%    0.8%      1.2%    +0.6%
Louisiana        1.8%      2.3%     2.1%    3.9%      3.0%    +0.7%
Connecticut      1.1%      0.2%     1.1%    0.8%      1.0%    +0.8%
Maryland         1.7%      1.0%     1.7%    1.8%      1.8%    +0.8%
Puerto Rico      1.5%      4.3%     7.8%    2.4%      5.1%    +0.8%
Massachusetts    2.0%      1.6%     1.9%    3.0%      2.4%    +0.9%
South Carolina   1.4%      0.0%     1.2%    0.6%      0.9%    +0.9%
Kentucky         1.4%      0.4%     1.6%    1.2%      1.4%    +1.0%
New Jersey       2.6%      1.2%     2.5%    2.5%      2.5%    +1.4%
Pennsylvania     4.1%      1.6%     2.5%    4.8%      3.7%    +2.1%
Michigan         3.5%      0.4%     1.7%    3.4%      2.6%    +2.2%
California      11.9%     19.2%    24.9%   18.2%     21.5%    +2.4%
Ohio             4.0%      1.6%     4.2%    4.9%      4.6%    +3.0%
New York         6.3%      2.2%     6.4%    4.4%      5.4%    +3.3%
Illinois         4.4%      1.6%     4.9%    5.0%      4.9%    +3.4%

For the visual learners among us, the most overscouted and underscouted states are mapped out below:

Any resemblance to the early returns from the 2000 presidential election is unintentional.

Although I think these data make a convincing case, there are a number of things to remember here:

  1. A lot of the overscouted states are in high-growth regions, and a lot of the underscouted states low-growth regions, so I've probably overstated the results a little bit;

  2. Since the warm-weather bias is stronger in later rounds of the draft, all players may not be on an equal playing field in terms of their likelihood of reaching the majors.

  3. I suspect that a relatively higher proportion of players who reach the major leagues from cold-weather states do so as college or Juco players, so it's more a matter of an opportunity delayed than an opportunity denied.

All that said, I think there are opportunities here for an enterprising organization. Warmer weather is conducive to playing baseball instead of another sport or another activity, and players from those states are no doubt more advanced at the same age than their counterparts from colder climes. But the longer seasons that this weather provides for also presents more opportunity for major league teams to have seen a given player multiple times. Especially in the lower rounds of the draft, mere familiarity may trump other considerations. Increasing the intensity of the scouting effort during the shorter seasons in the Northeast and Midwest won't come without cost, but it's a strategy that might pay for itself and then some.

Moreover, while there's no doubt that there are meaningful differences in the quality of high school competition between different regions, major league teams may exaggerate their importance in determining who is going to make the superior professional. Much as we emphasize drafting on results instead of potential when it comes to college players, the case is somewhat reversed when it comes to high schoolers. The ages of 18-21 are a time of tremendous growth for most baseball players--far more so than the ages of 21-24. A high school player from a cold-weather state might not have had the same chance to refine his skills that a player from Florida or Texas would, but in the right organization, he'll have plenty of time to do so in the lower minors.

The strategy of shifting scouting resources toward colder-weather states might work especially well for an organization that is primarily focused on collegiate players. If it's true that high school players are still overrated as a group--and I think they are--it's sensible to focus on those high schoolers that are relatively underrepresented, in areas where the saturation of scouts from other organizations will be less fierce. Major league teams are constantly looking for new sources of untapped talent, whether it's the Far East or Asia or even Europe. One of the most fruitful such areas may be right in their backyards.

Nate Silver is an author of Baseball Prospectus. 
Click here to see Nate's other articles. You can contact Nate by clicking here

0 comments have been left for this article.

No Previous Article
<< Previous Column
Premium Article Lies, Damned Lies: A W... (07/02)
Next Column >>
Premium Article Lies, Damned Lies: PEC... (07/16)
No Next Article

RECENTLY AT BASEBALL PROSPECTUS
Playoff Prospectus: Come Undone
BP En Espanol: Previa de la NLCS: Cubs vs. D...
Playoff Prospectus: How Did This Team Get Ma...
Playoff Prospectus: Too Slow, Too Late
Premium Article Playoff Prospectus: PECOTA Odds and ALCS Gam...
Premium Article Playoff Prospectus: PECOTA Odds and NLCS Gam...
Playoff Prospectus: NLCS Preview: Cubs vs. D...

MORE FROM JULY 9, 2003
Premium Article Can Of Corn: Elite Pitchers' Minor League Ca...
Premium Article Under The Knife: More Than Meets the Eye
Prospectus Triple Play: Atlanta Braves, Minn...
Premium Article Transaction Analysis: July 3-8, 2003

MORE BY NATE SILVER
2003-07-30 - Premium Article Lies, Damned Lies: Leading Off
2003-07-24 - Premium Article Lies, Damned Lies: Hitting the Wall
2003-07-16 - Premium Article Lies, Damned Lies: PECOTA Mid-Season Review
2003-07-09 - Premium Article Lies, Damned Lies: Digging in the Backyard
2003-07-02 - Premium Article Lies, Damned Lies: A Whole Different Ballgam...
2003-06-25 - Lies, Damned Lies: Redefining Replacement Le...
2003-06-18 - Premium Article Lies, Damned Lies: Bounces
More...

MORE LIES, DAMNED LIES
2003-07-30 - Premium Article Lies, Damned Lies: Leading Off
2003-07-24 - Premium Article Lies, Damned Lies: Hitting the Wall
2003-07-16 - Premium Article Lies, Damned Lies: PECOTA Mid-Season Review
2003-07-09 - Premium Article Lies, Damned Lies: Digging in the Backyard
2003-07-02 - Premium Article Lies, Damned Lies: A Whole Different Ballgam...
2003-06-25 - Lies, Damned Lies: Redefining Replacement Le...
2003-06-18 - Premium Article Lies, Damned Lies: Bounces
More...