Share with your friends










Submit

Analytics Magazine

Executive Edge: The times they are a changin’ for advanced analytics

May/June 2012

Statistical modelers urged to embrace machine learning, open-source tools for the road ahead.

Sameer ChopraBy Sameer Chopra

My thesis below addresses the following points:

  1. While statistical modeling is not going away, analytics groups are advised to leverage machine-learning approaches as well.
  2. While traditional statistical modeling software packages are not going away, analytics groups need to actively embrace new skill-sets in emerging software such as open-source tools (e.g., R, MangoDB) and Big Data tools (e.g., Hadoop). Big Data is just getting bigger, and new tools are emerging that round out the tool suite of analytics groups.

Statistical Modeling vs. Machine Learning

Since the mid-1990s I have used statistical modeling tools such as SAS as the primary tool for advanced analytics. I would place myself squarely in the camp of “statistical modelers” (vs. my machine learning friends – though I realize some might quibble with this distinction). Over the years I have led teams of statistical analysts who have primarily used such statistical packages as SAS/SPSS/S-Plus, etc. as their go-to analysis tool.

In my current capacity, I am responsible for advanced analytics at Orbitz Worldwide. Advanced analytics is a strategic lever at Orbitz and has the good fortune of executive support at the highest levels. Competing on analytics is feasible only if there is buy-in at the highest levels.

I lead the traditional statistical modelers as well as the chief scientist and the machine learning (ML) crew. At Orbitz, we have found value in incorporating both types of data mining professionals (machine learners and statistical modelers) because many problems are well-suited for both camps. For example, the statistical modelers effectively address areas such as marketing mix analysis, predictive models across online marketing channels, customer lifetime value models, churn models, credit card fraud models, etc. Similarly, the machine learning staff deploys their algorithms in areas leveraging Big Data, where system feedback is leveraged to quickly learn from patterns in order to self-improve – areas such as the Hotel Recommendation Engine and Hotel Sort  on the Orbitz Web site.

Conceptually, both camps are “data mining” professionals, so there is a lot of overlap. For instance, both fields do work with some common methods such as decision trees and clustering algorithms. I also find that the camps often use different jargon for the same basic concepts (“weights” vs. “parameters,” “learning” vs. “fitting,” etc.).
However, I find the machine learning area to clearly be of a different cloth – the contrast in tools and approaches between ML and statistical modelers is rather stark. The following are but a few examples to illustrate some differences between the two sides:

  • Apart from cosmetic differences in labels used, statistical modeling has a probabilistic approach with a strong emphasis on parametric assumptions, regression diagnostics, inference, hypothesis testing, interpretability of model and so on – areas not important in the ML world.
  • On the flip side, ML practitioners regularly use tools such as support vector machines (SVM), tools that are not commonly used by statistical modelers. ML focuses on predictive accuracy and not much on interpretation of models. Note that ML has its roots in artificial intelligence (AI), and practitioners of machine learning usually tend to have a strong computer science background – another key difference.

The comparison sparked the following question: “Which side of this analytics fence lends itself better to the road ahead?” My (likely controversial) response: “At this point in time, machine learning!” In fact, never before has the need for this been as forceful and urgent as it is today. I am not implying that statistical modeling is going away, but I am stating that machine learning is rapidly increasing in relevance and prominence. It makes sense for analytical teams to complement their skill sets by incorporating machine-learning approaches in order to be better positioned for the road ahead.

Not surprisingly, general interest in machine learning has exploded in the past year. Late last year, Stanford University offered a free online course in ML/AI that went viral to the point of having well over 100,000 students register from around the world in a matter of weeks! (This speaks to both the growing interest in ML as well as to a fundamental paradigm shift in the making vis-à-vis the educational method/framework.)

Big Data & Open Source Analytics

Machine learning lends itself well to situations where the design and development of algorithms is against high dimensional data where computational issues are very important – and the Big Data paradigm shift, along with open source tools, is ideally suited for ML to leverage.

The open source language R has become the data-mining tool of choice for machine learners for the following reasons:

  • R has very good integration with Hadoop, an area where established commercial statistical tools have frankly been playing catch-up over the past year. (Note: At the time of this writing, some established statistical solution providers were announcing an access interface to Hadoop.)
  • Many startups and smaller firms do not have deep pockets and are embracing open source tools such as the R programming language and NoSQL database systems such as MangoDB.
  • R is a leading language for developing new statistical methods, and it is a platform for statistical innovation and collaboration across both the corporate world and academia. In my opinion, for the first time in years, the stronghold of established commercial players seems to be potentially threatened; open source tools are better suited for Big Data and will slowly but surely continue to take share away from commercialized statistical packages. In fact, traditional statistical vendors have recognized that R is a force to be reckoned with. In response, many of these vendors have developed hooks into R so users can interface with the R language.
  • Based on the resumes I’ve been reading, the next generation of data miners is flocking to R as their go-to tool. Professors in general are comfortable with R; they tend to use R and Excel as part of their curriculum.
  • In short, open-source analytics tools and platforms have arrived.

R hasn’t been widely adopted in the corporate world because it used to be considered (and still is to a large extent) not quite “enterprise ready,” but even that is changing as firms such as Revolution Analytics focus on the enterprise capabilities for R.

Despite some hype associated with the topic of Big Data, it is generally acknowledged that Big Data and Distributed Computing are rapidly changing the analytics landscape. Leveraging Hadoop and being well-versed in MapReduce jobs is quickly transitioning from a “nice to know” to a “must do” skill. Here again, machine learning practitioners seamlessly tend to adapt, whereas many traditional statistical modelers seem to face a “who moved my cheese” syndrome. Prerequisites such as being well-versed in Python or Java tend to be second nature to those in the ML camp.

Conclusion

What does this mean for today’s traditional statistical modelers?

Gone are the days when a statistical analyst might have been complacent about a relatively slowly changing world (relative to say a computer science or IT professional who had to strive more to stay current with changing languages and new tools). In order to stay competitive, it would behoove traditional statistical modelers to proactively plunge into professional development mode and take a page from the book of our machine-learning friends.

Specifically, the best-in-class analytical organizations of the future will be those that embrace traditional statistical modeling and machine learning approaches along with established and emerging tools and technology associated with Big Data analytics, including R, Hadoop/HDFS, Map Reduce, Java/Python, Pig, Hive, etc.
The times they are a changin’….

Sameer Chopra (Sameer.Chopra@orbitz.com) is vice president of Advanced Analytics at Orbitz Worldwide, Inc., a leading global online travel company. He has more than 15 years of experience in applying data mining and predictive analytics across various business domains at both Fortune 500 firms and startups. Before joining Orbitz, Chopra led the Marketing Analytics and Web testing team at Intuit’s Small Business Group and served as director of analytics at eBay. He holds a master’s degree in Operations Research from the Massachusetts Institute of Technology.

business analytics news and articles

 

Related Posts

  • 47
    Attributes that often get someone hired as CEO may not be the ones that drive success once they are at the helm of the company. That’s one of many provocative insights in a study by consulting firm ghSMART. The study, recently featured in Harvard Business Review, identifies characteristics that differentiate…
    Tags: data, analytics, learning, machine
  • 42
    November/December 2014 The art of putting fragmented, often disconnected data sources together to generate actionable insights for the enterprise. By Durjoy Patranabish and Sukhda Dhal Big data has no doubt created a big business buzz, and organizations and thought leaders are constantly talking about big data, yet many critics note…
    Tags: data, big, machine, learning
  • 41
    Silicon Valley analytics software firm FICO was named a leader in the March 2017 report, The Forrester Wave: Predictive Analytics and Machine Learning Solutions, Q1 2017. The report explains, “FICO’s Decision Management Suite encompasses the end-to-end capabilities needed to create, deploy and monitor models for use in complex, consequential enterprise…
    Tags: analytics, learning, machine
  • 41
    FEATURES Fulfilling the promise of analytics By Chris Mazzei Strategy, leadership and consumption: The keys to getting the most from big data and analytics focus on the human element. How to get the most out of data lakes By Sean Martin A handful of requisite business skills that facilitate self-service…
    Tags: analytics, data, big, machine, learning
  • 38
    July/August 2015 How neuro-dynamic programming enables smart machines to think ahead. By Scott Zoldi The media and watercooler chatter alike increasingly focus on how advances in machine learning and artificial intelligence (AI) are boosting the ability of predictive analytics to benefit businesses’ bottom lines. Some of that talk ponders the…
    Tags: data, analytics, learning, big, machine

Analytics Blog

Electoral College put to the math test


With the campaign two months behind us and the inauguration of Donald Trump two days away, isn’t it time to put the 2016 U.S. presidential election to bed and focus on issues that have yet to be decided? Of course not.


Headlines

Gaining distribution in small retail formats brings big payoffs

Small retail formats with limited assortments such as Save-A-Lot and Aldi and neighborhood stores like Target Express have been growing in popularity in the United States and around the world. For brands, the limited assortments mean greater competition for shelf-space, raising the question of whether it is worth expending marketing effort and slotting allowances to get on to their shelves. According to a forthcoming study in a leading INFORMS scholarly marketing journal, Marketing Science, the answer is “yes.” Read more →

Cognitive computing a disruptive force, but are CMOs ready?

While marketing and sales professionals increasingly find themselves drowning in data, a new IBM study finds that 64 percent of surveyed CMOs and sales leaders believe their industries will be ready to adopt cognitive technologies in the next three years. However, despite this stated readiness, the study finds that only 24 percent of those surveyed believe they have strategy in place to implement these technologies today. Read more →

How weather can impact consumer purchase response to mobile ads

Among the many factors that impact digital marketing and online advertising strategy, a new study in the INFORMS journal Marketing Science provides insight to a growing trend among firms and big brands: weather-based advertising. According to the study, certain weather conditions are more amenable for consumer responses to mobile marketing efforts, while the tone of the ad content can either help or hurt such response depending on the current local weather. Read more →

UPCOMING ANALYTICS EVENTS

INFORMS-SPONSORED EVENTS

Essential Practice Skills for High-Impact Analytics Projects
Sept. 26-27, Executive Conference Center, Arlington, Va.

Foundations of Modern Predictive Analytics
Oct. 2-3, VT Executive Briefing Center, Arlington, Va.

2017 INFORMS Annual Meeting
October 22-25, 2017, Houston

2017 Winter Simulation Conference (WSC 2017)
Dec. 3-6, 2017, Las Vegas

CAP® EXAM SCHEDULE

CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:


 
For more information, go to 
https://www.certifiedanalytics.org.