Share with your friends










Submit

Analytics Magazine

ANALYZE THIS! A ‘Silver’ lining for election blues

Vijay MehrotraBy Vijay Mehrotra

For the past several months, I have spent hours staring at my screen, reading anything I can get my hands on that might help me get a sense of what might happen during the elections on Nov. 8. Since I live in Oakland, Calif., the heart of the uber-liberal bubble that is the San Francisco Bay Area, I am constantly searching for truly fair and balanced perspectives about what is really going on across the rest of the country, especially with regards to this year’s presidential election.

For analytics professionals, in fact, the world of presidential elections is familiar territory. Since George Gallup first applied statistical random sampling to draw conclusions about the results of the 1936 U.S. presidential election based on survey data, there has been a steady (and recently explosive) growth in the number of people gathering, analyzing, visualizing and interpreting data to try to understand, explain, predict and/or influence what might happen in our elections. Nowadays, during our seemingly endless presidential election campaigns, it feels as though there are new state or national poll results being announced every day for months. This almost nonstop and often contradictory stream of numbers often leaves me bewildered.

Which is why I spend a huge amount of time visiting fivethirtyeight.com, the political website created some years ago by Nate Silver, a “number-crunching prodigy” [1] who just might be the big data world’s first mainstream rock star. Since bursting onto the political scene in 2008 with predictions that: (a) were somewhat contrary to the punditry’s consensus and (b) often proved to be surprisingly accurate, Silver’s website has sought to use both historical voter data and information from political polls to make predictions about elections (the site, now owned by ESPN, also includes data-driven stories about sports, science, economics and culture). During this presidential election cycle, Silver and his team used one set of models for forecasting state-by-state primary results and another set for the general election.

Quite a bit of methodological detail about these forecasts is publicly available at fivethirtyeight.com [2,3]. After reading Silver’s book “The Signal and the Noise” [4] and scrutinizing these forecasting process descriptions, my sense is that Silver and his team are seeking to understand the same elusive truths about the electorate that I am, and as an engaged citizen I am grateful for their efforts. The blizzard of polling data is systematically examined, with some polls banned due to ethical and/or methodological concerns. Careful weighting is done to account for factors such as sample sizes, recency, frequency and past performance. A wide variety of adjustments are made to address factors such as post-convention bounces, third-party candidates, registered vs. likely voters and historical biases associated with particular polls. Moreover, over the past eight years Silver and his team have continued to tweak their models in different ways, and there appears to be a basic humility about the limits of prediction underlying their observations and claims, as well as a deep-seated wonky desire to tell an unbiased story.

Which brings us back to Oakland. In a recent article in Significance [5], Kristian Lum (lead statistician at the Human Rights Data Analysis Group) and William Isaac (doctoral candidate in the Department of Political Science at Michigan State University) examine the controversial topic of “predictive policing [6],” a term that refers to the use of data and models to make forecasts about where crime is most likely to take place, usually within urban population centers. While predictive policing software has been commercially available for some time, its efficacy has been hotly debated in crime-fighting circles [7].

Lum and Isaac’s main thesis is that the historical data used by these predicting policing systems is inherently biased, and that this bias is in turn propagated by the machine-learning algorithms that are embedded in the software. Citing a great deal of previous research, these authors are inherently suspicious about this historical data, concluding that “police records do not measure crime. They measure some complex interaction between criminality, policing strategy, and community-police relations.”

Using data about Oakland, the authors tell a thoughtful and illuminating data-driven story. Rather than accepting police data as their proxy for actual drug crime, they instead use data from the 2011 National Survey on Drug Use and Health (NSDUH). After making a solid case for why the NSDUH data is likely to provide a more accurate snapshot of drug use in Oakland than past arrest data, the paper then contrasts the NSDUH data with the actual police arrest data for drug crimes for 2010, pointing out that “while drug crimes exist everywhere, drug arrests tend to only occur in very specific locations – the police data appear to disproportionately represent crimes committed in areas with higher populations of non-white and low-income residents.”

The authors then make an important observation about how biases can quickly propagate even more quickly than the machine-learning models would suggest:

“But what if police officers have incentives to increase their productivity as a result of either internal or external demands? If true, they might seek additional opportunities to make arrests during patrols. It is then plausible that the more time police spend in a location, the more crime they will find in that location.”

Putting all of this together, the overarching premise here is as follows: (a) algorithms based on biased input data suggest that crime will be found in certain areas, which leads to (b) more policing in those areas, which causes (c) dramatically larger numbers of arrests in those areas relative to other less policed areas, which leads to (d) increased bias in the input data for the algorithms. The paper concludes by presenting the results of a simulation that vividly illustrates this insidious cycle.

In the big data age, our understanding of the world and the future – and our decisions about what actions to take based on that understanding – are increasingly dependent on algorithms. As analytics professionals, most of us have some inherent bias toward data-driven methods, but often this bias should be tempered with a healthy dose of skepticism about such models and about the data that drives them. Political polling and predictive policing are just two pernicious examples of how biased data can distort our beliefs and behaviors – but as a dark-skinned foreigner living in America during this year’s presidential election and a proud resident of Oakland, they hit really particularly close to home for me.


Vijay Mehrotra (vmehrotra@usfca.edu) is a professor in the Department of Business Analytics and Information Systems at the University of San Francisco’s School of Management and a longtime member of INFORMS.

REFERENCES & NOTES

  1. http://nymag.com/news/features/51170/
  2. http://fivethirtyeight.com/features/how-we-are-forecasting-the-2016-presidential-primary-election/
  3. http://fivethirtyeight.com/features/a-users-guide-to-fivethirtyeights-2016-general-election-forecast/
  4. https://www.amazon.com/Signal-Noise-Many-Predictions-Fail-but/dp/0143125087
  5. Significance is joint publication of the Royal Statistical Society and the American Statistical Association, available online at www.rss.org.uk/significance
  6. http://onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2016.00960.x/full
  7. See for example http://www.sciencemag.org/news/2016/09/can-predictive-policing-prevent-crime-it-happens

Related Posts

  • 61
    As I write this, the 2016 U.S. presidential campaign staggers toward the finish line, leaving behind a trail of mud the likes of which we’ve never seen before. When the election is finally over, no matter the outcome, I think we all could use a hot shower.
    Tags: election, presidential, silver, politics, nate, policing
  • 45
    FEATURES ABM and predictive lead scoring Account-based marketing, and the related technology of predictive lead scoring, is dramatically changing the face of sales and marketing. By Megan Lueders Software Survey: Joys, perils of statistics Trends, developments and what the past year of sports and politics taught us about variability and…
    Tags: data, election, presidential, politics
  • 45
    FEATURES ABM and predictive lead scoring Account-based marketing, and the related technology of predictive lead scoring, is dramatically changing the face of sales and marketing. By Megan Lueders Software survey: joys, perils of statistics Trends, developments and what the past year of sports and politics taught us about variability and…
    Tags: data, election, presidential, politics
  • 37
    Five months have come and gone since the U.S. presidential election and three months have come and gone since the inauguration, and yet the country remains evenly divided over almost everything even remotely related to politics. No, I didn’t expect partisan passions to magically dissipate after Election Day, but I…
    Tags: election, politics, data
  • 36
    Accenture has helped the Seattle Police Department (SPD) build and deploy a new data analytics platform that provides the SPD with reliable and rapidly accessible data to meet its management and governance objectives and support insight-led policing.
    Tags: data, police, policing

Analytics Blog

Electoral College put to the math test


With the campaign two months behind us and the inauguration of Donald Trump two days away, isn’t it time to put the 2016 U.S. presidential election to bed and focus on issues that have yet to be decided? Of course not.


Headlines

Three keys for organizations to gain value from information

In the current information-driven society and increasingly digitalized world, Gartner, Inc. says that sentiments are shifting from the economics of tangible assets to the economics of information – “infonomics” – and other intangible assets. Infonomics is the theory, study and discipline of asserting economic significance to information. It strives to apply both economic and asset management principles and practices to the valuation, handling and deployment of information assets.  Read more →

Burtch Works study on ‘Salaries of Predictive Analytics Professionals’

According to the recently released Burtch Works study on “Salaries of Predictive Analytics Professionals 2017,” senior-level executives saw the largest increase in salaries from 2016 to 2017, and industry diversification of employment has diluted the concentration of such professionals from financial services and marketing/advertising to consulting and technology. Read more →

New study asks, ‘Is your business AI-ready?’

Despite fears that robots will replace human labor, the majority of artificial intelligence (AI) leaders (79 percent) expect their employees will work comfortably with robots by 2020, according to a new Genpact survey of C-Suite and senior executives titled, “Is Your Business AI-Ready?” Read more →

UPCOMING ANALYTICS EVENTS

INFORMS-SPONSORED EVENTS

2017 Winter Simulation Conference (WSC 2017)
Dec. 3-6, 2017, Las Vegas

CAP® EXAM SCHEDULE

CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:


 
For more information, go to 
https://www.certifiedanalytics.org.