Share with your friends










Submit

Analytics Magazine

ANALYZE THIS! Carefully considering some uncomfortable scenarios

Analytics data science news articles

Throughout history, data sets have been constructed and utilized for many different purposes, not all of them positive.

Vijay MehrotraBy Vijay Mehrotra

I’ve spent the last few months working with the Human Rights Data Analysis Group. HRDAG has historically focused on unstable regions around the world, scientifically examining data captured by local institutions and grassroots activists to try to discern the truth about the volume and patterns of human rights violations [1]. My work there has been focused on data from Colombia. Over the past several years, HRDAG has done several projects on Colombia, examining deaths, disappearances, torture, sexual violence and violence against union members during the longstanding conflict between armed rebels, criminal organizations and the Colombian government.

My primary project has been to look at output from a previous HRDAG analysis of homicides in Colombia in order to develop insightful visualizations and explanations about the pattern of killings. My results are intended for a relatively non-technical audience, and that is where I have bumped into a major communications challenge: Because HRDAG’s predictions are based on fairly sophisticated Bayesian Models [2], the results that I am describing are probability distributions rather than point estimates.

As it happens, this is the same problem recently faced by Nate Silver. Like virtually everyone else in the punditry, Silver had predicted that Hillary Clinton was likely to win the election (the final models at fivethirtyeight.com gave her roughly a 70 percent chance of victory). However, as Silver described in a blog post [3]  just after the election, Clinton was leading in the vast majority of national and swing state polls going into Nov. 9 but by very slim margins. As such, Silver’s pre-election commentary on fivethirtyeight.com described a variety of possible outcomes, ranging from a Clinton landslide to a narrow Trump victory, citing both the variability inherent in his group’s models as well as potential biases in the polling data [4]. In particular, the scenario in which Clinton captured the popular vote while losing the Electoral College vote was mentioned repeatedly prior to Election Day. In the probabilistic sense, the Donald Trump victory that we observed was clearly “predicted.”

Back here at HRDAG, I keep busy prepping and merging tables in Python, using Hadley Wickham’s ggplot2 package to graph multiple time series in R, and working to understand and explain what is driving the variability in the model estimates. I am intellectually engaged and am learning a great deal. But I am always aware that each of these data sources is a dehumanized digital catalog containing information about thousands of people’s lives.

Throughout history, these types of data sets, euphemistically referred to as “administrative lists” in the statistics literature, have been constructed and utilized for many different purposes, not all of them positive.

As such, President-Elect Trump’s call for a registry of Muslims disturbs me greatly, and not only because this has already been shown to be an ineffective strategy for fighting terrorism [5]. In the digital world that we live in, so much personal data about so many people is captured and stored across so many databases large and small. The social network data in Facebook and LinkedIn is, literally, a roadmap of how we are organized and connected. Organizations that host public or private clouds have extraordinary troves of intelligence about individual citizens living on their servers. The idea of any U.S. president with an army of data scientists seizing control of all of this data and using it against his or her perceived enemies is truly frightening – and is also contrary to everything this country stands for.

As technology journalist Salvador Rodriguez has noted, “The establishment of such a registry would require engineering prowess and troves of data – both the sort of thing big Silicon Valley companies boast in quantity” [6]. Given this context, I cringe when I see photos of tech industry CEOs assembled in a conference room at Trump Tower. I cheer at the growing list of companies who publicly state that they will not participate in the creation of such a database (as of this writing, this list has grown to include Apple, Google, Facebook, Salesforce.com, and many more). And I shudder when it is reported that Oracle CEO Safra Catz has joined the Trump transition team.

At the individual level, many professionals in the technology industry have also responded by taking a personal stand. To date, several thousand have signed a public online pledge committing “to stand in solidarity with Muslim Americans, immigrants, and all people whose lives and livelihoods are threatened by the incoming administration’s proposed data collection policies. We refuse to build a database of people based on their constitutionally protected religious beliefs. We refuse to facilitate mass deportations of people the government believes to be undesirable. We have educated ourselves on the history of threats like these, and on the roles that technology and technologists played in carrying them out.” You can learn more about this pledge at http://neveragain.tech.

History suggests that in the days and years ahead, we may be called upon to use our analytics skills in ways that may be unconstitutional, immoral or both. These invitations will often come with opportunities for intellectual challenges, financial rewards and/or personal glory, and there will almost surely be other pressures to acquiesce. But we must all be aware that there are real people behind the abstract digital representations in our data sets, that both our data and our models are almost always imperfect, and that our work can reveal information and insights that can have serious human consequences . In a turbulent context, analytics are rarely ethically neutral.

I do not pretend to know what the future holds. But it seems appropriate to develop a probabilistic forecast with significant variability, and to carefully consider some uncomfortable scenarios. Don’t fool yourself: It could indeed happen here. And our individual choices will definitely matter. History does not write itself.

Vijay Mehrotra (vmehrotra@usfca.edu) is a professor in the Department of Business Analytics and Information Systems at the University of San Francisco’s School of Management and a longtime member of INFORMS.

Editor’s note:

The views expressed in this column are those of the author and do not necessarily reflect the views of INFORMS or Analytics magazine.

References & Notes

  1. http://analytics-magazine.org/analyze-this-human-rights-group-confronts-abuses-with-data-driven-evidence/
  2. For more on HRDAG’s estimation methodology, see https://hrdag.org/coreconcepts/.
  3. http://fivethirtyeight.com/features/why-fivethirtyeight-gave-trump-a-better-chance-than-almost-anyone-else/
  4. See for example http://fivethirtyeight.com/features/pollsters-probably-didnt-talk-to-enough-white-voters-without-college-degrees/.
  5. For more on the ineffectiveness of a similar Bush-era registry, see http://www.cnn.com/2016/11/18/politics/nseers-muslim-database-qa-trnd/.
  6. http://www.inc.com/salvador-rodriguez/tech-muslim-registry-pledge.html

Analytics data science news articles

Save

Related Posts

  • 66
    Thousands of companies all over the world are competing for a finite number of data scientists, paying them big bucks to join their organizations – and setting them up for failure.
    Tags: data, science
  • 65
    Data science has seen a dramatic rise in the last decade. The LinkedIn 2017 U.S. Emerging Jobs Report revealed the two fastest growing jobs as “machine learning engineer” and “data scientist.” Universities are struggling to keep up with this trend, assembling new programs to address the growing need for data…
    Tags: data, science
  • 65
    Data science has seen a dramatic rise in the last decade. The LinkedIn 2017 US Emerging Jobs Report revealed the two fastest growing jobs as “machine learning engineer” and “data scientist.” Universities are struggling to keep up with this trend, assembling new programs to address the growing need for data…
    Tags: data, science
  • 61
    More than seven years ago, McKinsey & Company famously predicted that by 2018 there would be a shortage of 140,000-190,000 people with “deep analytical skills” (i.e., data scientists) in the United States. A year later, a 2012 article in the Harvard Business Review just as famously labeled data science “the…
    Tags: data, science
  • 60
    Data can appear lifeless and dull on the surface – especially government data – but the thought of it should actually get you excited. Data is a very interesting and powerful thing. First off, data is exactly the stuff we bother to write down – and for good reason. But…
    Tags: data, science

Headlines

Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a $320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →

UPCOMING ANALYTICS EVENTS

INFORMS-SPONSORED EVENTS

INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.

OTHER EVENTS

Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)

CAP® EXAM SCHEDULE

CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:


 
For more information, go to 
https://www.certifiedanalytics.org.