Share with your friends


Analytics Magazine

ANALYZE THIS! A ‘Silver’ lining for election blues

Vijay MehrotraBy Vijay Mehrotra

For the past several months, I have spent hours staring at my screen, reading anything I can get my hands on that might help me get a sense of what might happen during the elections on Nov. 8. Since I live in Oakland, Calif., the heart of the uber-liberal bubble that is the San Francisco Bay Area, I am constantly searching for truly fair and balanced perspectives about what is really going on across the rest of the country, especially with regards to this year’s presidential election.

For analytics professionals, in fact, the world of presidential elections is familiar territory. Since George Gallup first applied statistical random sampling to draw conclusions about the results of the 1936 U.S. presidential election based on survey data, there has been a steady (and recently explosive) growth in the number of people gathering, analyzing, visualizing and interpreting data to try to understand, explain, predict and/or influence what might happen in our elections. Nowadays, during our seemingly endless presidential election campaigns, it feels as though there are new state or national poll results being announced every day for months. This almost nonstop and often contradictory stream of numbers often leaves me bewildered.

Which is why I spend a huge amount of time visiting, the political website created some years ago by Nate Silver, a “number-crunching prodigy” [1] who just might be the big data world’s first mainstream rock star. Since bursting onto the political scene in 2008 with predictions that: (a) were somewhat contrary to the punditry’s consensus and (b) often proved to be surprisingly accurate, Silver’s website has sought to use both historical voter data and information from political polls to make predictions about elections (the site, now owned by ESPN, also includes data-driven stories about sports, science, economics and culture). During this presidential election cycle, Silver and his team used one set of models for forecasting state-by-state primary results and another set for the general election.

Quite a bit of methodological detail about these forecasts is publicly available at [2,3]. After reading Silver’s book “The Signal and the Noise” [4] and scrutinizing these forecasting process descriptions, my sense is that Silver and his team are seeking to understand the same elusive truths about the electorate that I am, and as an engaged citizen I am grateful for their efforts. The blizzard of polling data is systematically examined, with some polls banned due to ethical and/or methodological concerns. Careful weighting is done to account for factors such as sample sizes, recency, frequency and past performance. A wide variety of adjustments are made to address factors such as post-convention bounces, third-party candidates, registered vs. likely voters and historical biases associated with particular polls. Moreover, over the past eight years Silver and his team have continued to tweak their models in different ways, and there appears to be a basic humility about the limits of prediction underlying their observations and claims, as well as a deep-seated wonky desire to tell an unbiased story.

Which brings us back to Oakland. In a recent article in Significance [5], Kristian Lum (lead statistician at the Human Rights Data Analysis Group) and William Isaac (doctoral candidate in the Department of Political Science at Michigan State University) examine the controversial topic of “predictive policing [6],” a term that refers to the use of data and models to make forecasts about where crime is most likely to take place, usually within urban population centers. While predictive policing software has been commercially available for some time, its efficacy has been hotly debated in crime-fighting circles [7].

Lum and Isaac’s main thesis is that the historical data used by these predicting policing systems is inherently biased, and that this bias is in turn propagated by the machine-learning algorithms that are embedded in the software. Citing a great deal of previous research, these authors are inherently suspicious about this historical data, concluding that “police records do not measure crime. They measure some complex interaction between criminality, policing strategy, and community-police relations.”

Using data about Oakland, the authors tell a thoughtful and illuminating data-driven story. Rather than accepting police data as their proxy for actual drug crime, they instead use data from the 2011 National Survey on Drug Use and Health (NSDUH). After making a solid case for why the NSDUH data is likely to provide a more accurate snapshot of drug use in Oakland than past arrest data, the paper then contrasts the NSDUH data with the actual police arrest data for drug crimes for 2010, pointing out that “while drug crimes exist everywhere, drug arrests tend to only occur in very specific locations – the police data appear to disproportionately represent crimes committed in areas with higher populations of non-white and low-income residents.”

The authors then make an important observation about how biases can quickly propagate even more quickly than the machine-learning models would suggest:

“But what if police officers have incentives to increase their productivity as a result of either internal or external demands? If true, they might seek additional opportunities to make arrests during patrols. It is then plausible that the more time police spend in a location, the more crime they will find in that location.”

Putting all of this together, the overarching premise here is as follows: (a) algorithms based on biased input data suggest that crime will be found in certain areas, which leads to (b) more policing in those areas, which causes (c) dramatically larger numbers of arrests in those areas relative to other less policed areas, which leads to (d) increased bias in the input data for the algorithms. The paper concludes by presenting the results of a simulation that vividly illustrates this insidious cycle.

In the big data age, our understanding of the world and the future – and our decisions about what actions to take based on that understanding – are increasingly dependent on algorithms. As analytics professionals, most of us have some inherent bias toward data-driven methods, but often this bias should be tempered with a healthy dose of skepticism about such models and about the data that drives them. Political polling and predictive policing are just two pernicious examples of how biased data can distort our beliefs and behaviors – but as a dark-skinned foreigner living in America during this year’s presidential election and a proud resident of Oakland, they hit really particularly close to home for me.

Vijay Mehrotra ( is a professor in the Department of Business Analytics and Information Systems at the University of San Francisco’s School of Management and a longtime member of INFORMS.


  5. Significance is joint publication of the Royal Statistical Society and the American Statistical Association, available online at
  7. See for example

Related Posts

  • 61
    As I write this, the 2016 U.S. presidential campaign staggers toward the finish line, leaving behind a trail of mud the likes of which we’ve never seen before. When the election is finally over, no matter the outcome, I think we all could use a hot shower.
    Tags: election, presidential, silver, politics, nate, policing
  • 45
    FEATURES ABM and predictive lead scoring Account-based marketing, and the related technology of predictive lead scoring, is dramatically changing the face of sales and marketing. By Megan Lueders Software Survey: Joys, perils of statistics Trends, developments and what the past year of sports and politics taught us about variability and…
    Tags: data, election, presidential, politics
  • 45
    FEATURES ABM and predictive lead scoring Account-based marketing, and the related technology of predictive lead scoring, is dramatically changing the face of sales and marketing. By Megan Lueders Software survey: joys, perils of statistics Trends, developments and what the past year of sports and politics taught us about variability and…
    Tags: data, election, presidential, politics
  • 37
    Five months have come and gone since the U.S. presidential election and three months have come and gone since the inauguration, and yet the country remains evenly divided over almost everything even remotely related to politics. No, I didn’t expect partisan passions to magically dissipate after Election Day, but I…
    Tags: election, politics, data
  • 36
    Accenture has helped the Seattle Police Department (SPD) build and deploy a new data analytics platform that provides the SPD with reliable and rapidly accessible data to meet its management and governance objectives and support insight-led policing.
    Tags: data, police, policing


Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a $320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →



INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to