Viewpoint: Did Nate Silver beat the tortoise?
O.R. vs. analytics … and now data science?
By Brian Keller
In a 2010 survey , members of the Institute for Operations Research and the Management Sciences (INFORMS) were asked to compare operations research (O.R.) and analytics. Thirty percent of the respondents stated, “O.R. is a subset of analytics,” 29 percent stated, “analytics is a subset of O.R.,” and 28 percent stated, “advanced analytics is the intersection of O.R. and analytics.” The remaining 13 percent were split between “analytics and O.R. are separate fields” (7 percent) and “analytics is the same as O.R.” (6 percent).
The emergence of data science only adds to the confusion. Is data science just another clever marketing term popularized by the math illuminati?
INFORMS has developed working definitions of both O.R. and analytics through surveys of INFORMS members and Analytics magazine readers. O.R. is the “application of advanced analytical methods to help make better decisions.” Analytics is the “scientific process of transforming data into insight for better decision-making.”
Data Science: An Emerging Field
Data science is an emerging field with no standard definition yet. An early description can be found in “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics” . I think of data science as an interdisciplinary field combining mathematics, statistics and computer science to create products based on data. The delivery of data products is the key idea. More on that later.
Indeed, the definitions for each sound similar. Differences begin to emerge when looking at O.R., analytics and data science in terms of the focus of the discipline and types of techniques applied.
Operations research tends to focus on the solution of a specific problem using a defined set of methods and techniques . Classic examples of O.R. include facility location problems, scheduling and deciding how many lines should be opened at a service center, which are all problem-solution focused. Techniques tend to be model-driven in which analysts select a reasonable model, fit the model parameters to the data and analyze results. Based on survey data in “ASP: The Art and Science of Practice” , the top O.R. quantitative skills are optimization, decision analysis and simulation.
Analytics tends to go beyond solving a single problem and focuses on overall business impact . Classic examples of analytics include business intelligence to summarize operations and customer segmentation for improved marketing and sales. The same survey identified the top analytics quantitative skills as statistics, data visualization, data management and data mining.
Data science tends to focus on data as a product. For example, Amazon records your searches, correlates them with other users and offers you suggestions on what you might like to buy. Those suggestions are data products that personalize the world’s biggest market, which drives sales. Google Now presents the results of your search before you even think to search for the information. Google Now is a data product that increases use of Google services, which delivers added revenue to Google.
Amazon product recommendations and Google Now may sound like an analytic, which focuses quantitative effort on a broader business impact. However, the results of data science are not just competitive advantages; results of data science are the products of the company. The data is the product.
Creating data products requires a strong sense of creativity and diverse perspectives of thought. As such, data scientists hail from a variety of academic backgrounds including O.R., statistics, computer science, engineering, biology and physics. The common themes across data scientists are creativity, curiosity to ask bigger questions, skills in data analysis and programming.
Data science often relies on combining multiple types of data together for analysis. Some data may be company proprietary; other data is available from one of the many public data sets available on the Web. These data sets often are too large to analyze using desktop tools, have missing or erroneous data, vary in structure across data sets, and may be lacking structure entirely (e.g., free-form text in maintenance repair logs). The combination of data size and structure adds an additional challenge on top of data analysis – the data itself becomes part of the problem.
Leveraging Diverse Skills
Because so much effort of data science work falls on parsing, cleaning and managing the data, data scientists often must leverage diverse software development skills. One project may use Python for data acquisition and parsing, R for exploratory analysis, Hadoop for data storage and Map Reduce via Java for production analytics, with results delivered through Ruby on Rails. Analytics practitioners share in many of the data management challenges of data scientists, although usually at a smaller scale. In contrast, O.R. applications tend to focus on problem solution, and O.R. analysts usually use fewer tools during a project.
Visualization is key to the success of data science projects since the information must be consumable to users. Who would want to use Google Now if it presented results in a table with p-values? Similarly, analytics practitioners value data visualization, whereas visualization is much less important to O.R. practitioners .
Analysis techniques may also differ with the large amounts of data collected. O.R. and analytics approaches generally assume a model and then fit the model to the data. The large amounts of data collected in many data science projects enable an alternative, model-free, data-driven approach. For example, automated language translation algorithms were predominantly manual, rule-driven approaches until an increase in storage and compute power enabled storage and processing of large amount of bilingual text corpora from which statistical models could infer the translation rules from the data.
DuoLingo , a free language learning website, has created a data product based on a data-driven approach. As users progress through lessons, they help translate websites and documents. In other lessons, users vote on correctness of translations. Statistical models based on user skill choose the best translations of documents, which others have submitted to be translated for a fee.
O.R., analytics and data science are closely related – all apply math to gain insights – and the fuzzy descriptions of the three disciplines above have boundaries as porous as the borders of countries in the European Union. However, just as a person in Germany is most likely a German (although he or she could be French or Italian), an O.R./analytics/data science practitioner will most likely fit the description outlined in this article.
Brian Keller (firstname.lastname@example.org), Ph.D., is a data science practitioner and lead associate at Booz Allen Hamilton. He is a member of INFORMS.
- Matthew Liberatore, Wenhong Luo, “INFORMS and the Analytics Movement: The View of the Membership,” Interfaces, Vol. 41, No. 6, November-December 2011, pp. 578–589.
- W. S. Cleveland, “Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics,” ISI Review, Vol. 69, p 21-26, 2001.
- Matthew Liberatore, Wenhong Luo, “ASP: The Art and Science of Practice,” Interfaces, Vol. 43, No. 2, p 194-197, March/April 2013.
- 86INFORMS member Brenda L. Dietrich, an IBM Fellow, vice president and leader of IBM’s data science group, was recently profiled by Forbes in an article headlined, “Meet 9 Women Leading The Pack In Data Analytics.” Dietrich is also an INFORMS Fellow and a member of the National Academy of Engineering.…
- 72FEATURES ABM and predictive lead scoring Account-based marketing, and the related technology of predictive lead scoring, is dramatically changing the face of sales and marketing. By Megan Lueders Software survey: joys, perils of statistics Trends, developments and what the past year of sports and politics taught us about variability and…
- 72FEATURES ABM and predictive lead scoring Account-based marketing, and the related technology of predictive lead scoring, is dramatically changing the face of sales and marketing. By Megan Lueders Software Survey: Joys, perils of statistics Trends, developments and what the past year of sports and politics taught us about variability and…
- 69INFORMS Editor’s Cut is a new, open access, comprehensive online multimedia library that identifies and utilizes a variety of great information about operations research and analytics across a range of current topics and issues. The latest issue on Big Data Analytics provides access to a wealth of content, including journal…
- 67The 2017 INFORMS Conference on Business Analytics & Operations Research will take place in Las Vegas on April 2-4 at Caesars Palace. Analytics 2017 will bring together nearly a thousand leading analytics professionals and industry experts to share ideas, network and learn through real-life examples of data-based analytical decisions.