Share with your friends










Submit

Analytics Magazine

Viewpoint: Did Nate Silver beat the tortoise?

January/February 2013

Arnold BarnettBy Arnold Barnett

In making election forecasts for the FiveThirtyEight blog (538) at the New York Times, Nate Silver uses a statistical model that is subtle, sophisticated and comprehensive. Real Clear Politics uses a shallow approach to forecasting that could have been devised by a statistical Forrest Gump. But which forecaster better predicted the results in the 2012 presidential election? Did the intellectual tortoise hold its own against the hare?

From a conceptual standpoint, it should have been no contest. In an approach that would make statisticians shudder, Real Clear Politics (RCP) estimated the Obama/Romney difference in a given state by the simple average of differences in recent polls. Differences in sample sizes were ignored, the word “recent” was defined differently in different states, undecided voters were simply excluded, and evidence that some polls skew toward Republicans and others toward Democrats got no weight. The 538 model, in contrast, avoided all these limitations, and took account of correlations among outcomes in similar states and the demographic makeup of each.

From Theory to Practice

But how did the final state-by-state predictions under the two approaches compare in accuracy? RCP only made forecasts in 30 of the 51 states (including the District of Columbia), but these included all swing states and all large states. And, at first blush, it might appear that the race between the two methodologies in the 30 states was (in the familiar phrase) too close to call.

The most obvious dimension for comparison is the bottom line: Did the forecast in a given state correctly identify the winner there? By that standard, both methods did very well: In 29 of the 30 states, they agreed who the winner would be and that candidate actually won. (Complete data tables will appear in a longer version of this article in the February 2013 issue of OR/MS Today.) In Florida, neither forecaster made a correct forecast: RCP erroneously projected a narrow Romney victory (1.5 percentage points), while 538 projected an exact tie (and thus abstained from forecasting). Obama carried Florida by 0.9 percentage points. We can say, therefore, that 538 scored a partial victory over RCP in one of 30 states, but that is hardly a decisive advantage.

As for the absolute forecast errors in the various states, the results were once again similar. The mean absolute error over the 30 states was 2.87 percentage points for RCP and 2.25 for 538. However, there is a “blue state bias” among the 30 states: Romney carried only 27 percent of them (eight out of 30), while he captured 47 percent (24 out of 51) in the entire nation. When an adjustment is made for this bias, the mean absolute error becomes 2.57 points for RCP and 2.33 for 538. This revised difference of one-quarter of one percentage point is hardly decisive.

On the Other Hand

Yet this aggregate analysis is oblivious to the central dynamic of the 2012 election. Given the realities of the Electoral College, the candidates and everyone else recognized that the outcome would be determined by what happened in about a dozen “swing states” that either candidate could plausibly win. In the other states, the winner was a foregone conclusion so there was little campaigning and little interest in polling results.

Under the circumstances, a comparison between RCP and 538 should focus primarily if not exclusively on their accuracy in swing states. RCP identified 11 states as “toss up” just before the election: Colorado, Florida, Iowa, Michigan, New Hampshire, Nevada, North Carolina, Ohio, Pennsylvania, Virginia and Wisconsin.

Within these states, the two approaches differed markedly in performance. 538 outperformed RCP in absolute forecast accuracy in all but one of the 11 swing states (Ohio). Both forecasters were on average more favorable to Romney than the actual voters, but the net “bias” was only 0.76 percentage points for 538 over the 11 states as opposed to 2.44 points for RCP. That difference of 1.68 (2.44-0.76) points is especially noteworthy because regression analysis makes clear that the 538’s estimates about Obama’s performance were consistently about 1.5 points higher than those of RCP. Again and again, this adjustment was vindicated by the swing-state results: RCP underestimated Obama’s actual vote share, while 538 eliminated roughly 75 percent of the underestimation.

In the 19 states out of the 30 originally compared that were not swing states, 538 and RCP performed about equally well, which is why statistics based on all 30 states yielded less disparity between the two approaches than the swing states alone. It could be that Obama outperformed the swing-state polls upon which RCP relied because of the major voter-turnout drives that his campaign undertook in those states, which brought many people to the voting booths whom pollsters had not included in tabulations about “likely” voters. In the other states, the Obama campaign may not have waged such efforts, so no comparable “surge” occurred.

Nate Silver would be the first to agree that his state-by-state forecasts were correlated, and that circumstance stymies assessments of whether his swing-state victory over RCP was statistically significant. In effect, he made an all-or-nothing bet on the premise that the polls underestimated Obama’s strength in swing states: Had this premise been wrong, his 11-1 victory over RCP could easily have been a 12-0 defeat. Yet uncertainties about how to define statistical significance cannot obscure the fundamental point: 538 did extremely well in 2012 in those states where accuracy was most important.

Final Remarks

So how does it all add up? Under the Occam’s Razor principle, there is a clear starting preference for simple models over more complicated formulations. A complex model must justify its intricacy by offering more accurate information than a simpler counterpart; moreover, this added information should arise in places where it is most needed. In the present setting, the question is whether Nate Silver’s 538 model outperformed the straightforward RCP method to an extent that makes 538 the wiser choice, even if the less transparent one.

Readers can reach their own judgments, but because of the results in the swing states, the author believes that 538 met the test for superiority just posed. While the tortoise catches up with the hare in the nursery stories, it seems here that the hare won hands down. But the outcome does not contradict Aesop’s fable because, far from being lazy, the 538 hare ran the race as hard as it could. And, if the evidence is any guide, it is very much a world-class runner.


Arnold Barnett (abarnett@mit.edu) is the George Eastman Professor of Management Science at the MIT Sloan School of Management. His research specialty is applied mathematical modeling with a focus on problems of health and safety. Barnett is a senior member of INFORMS.

business analytics news and articles



Headlines

Report: One in five cloud-based user accounts may be fake

According to the Q2 2018 DataVisor Fraud Index Report, more than one in five user accounts set up through cloud service providers may be fraudulent. The report, based on information gathered between April and June, analyzes 1.1 billion active user accounts, 1.5 million email domains, 231,000 device types and 562 cloud hosting providers and data centers, among other indicators. Read more →

When managers respond to online critics, more negative reviews ensue

A new study in the INFORMS journal Marketing Science found that when managers respond to online reviews it’s possible that those responses could actually stimulate additional reviewing activity and an increased number of negative reviews. The study, “Channels of Impact: User Reviews When Quality is Dynamic and Managers Respond,” is authored by Judith Chevalier of the Yale School of Management and NBER, Yaniv Dover of the Hebrew University of Jerusalem and Dina Mayzlin of the Marshal School of Business at the University of Southern California. Read more →

IE student designs software to optimize snow removal at Penn State

It is well known among the State College and Penn State communities that it takes a lot for university officials to shut the campus down after a major snowfall. In fact, since 2010, the University Park campus has been shut down just three full days due to snowfall. Much to the chagrin of students – and faculty and staff – the snow day at Penn State may just have become even more elusive, thanks to software developed by recent industrial engineering graduate Achal Goel. Read more →

UPCOMING ANALYTICS EVENTS

INFORMS-SPONSORED EVENTS

Winter Simulation Conference
Dec. 9-12, 2018, Gothenburg, Sweden

INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.

OTHER EVENTS

Applied AI & Machine Learning | Comprehensive
Dec. 3, 2018 (live online)


Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)

CAP® EXAM SCHEDULE

CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:


 
For more information, go to 
https://www.certifiedanalytics.org.