Share with your friends


Analytics Magazine

Online Marketing: Recommendation engines at work

March/April 2013

Barriers to their deployment are coming down while opportunities for their deployment are improving.

Christopher BerryBy Christopher Berry

If you’ve used the streaming service of Netflix, bought something from Amazon or connected with “people you may know” on LinkedIn or Facebook, then you’ve used a recommendation engine. And chances are, you’ve watched more movies, bought more stuff and are less likely to churn because of them. These engines work.

Recommendation engines match features about you with things that you might be interested in.

For instance, a movie has a release year, a genre, actors and box office results. You have features. You have preferences, an age, and you may have completed a survey expressing some of your attitudes toward certain movies. You may have rated some of the movies you watched. By figuring out which sets of movies to show you, and your response to those recommendations, the machine learns over time to make better suggestions. If you watched a few science fiction movies, and you rated them highly, then the engine will learn to show you more science fiction movies, and, for variety, movies that other people like you, who like science fiction movies, might also enjoy.

More and more companies are using recommendation engines. Apple has its own engine to help consumers find apps they are likely to enjoy from Apple’s large inventory. Microsoft’s XBOX 360 Live has an engine to suggest new games you might be interested in based on what you’ve previously shown an interest in.

Many of the algorithms that are used in recommendation engines and machine learning aren’t all that new. Regression, decision trees, K-nearest neighbor (KNN), support vector machines (SVMs), neural networks and naive Bayes are established methods with well-known constraints and appropriate uses. Many of these methods have been used to support data-driven business decision-making for a long time. So, if the benefits of recommendation engines have been long known, what’s different now? What’s causing more companies to implement recommendation engines to support customer decision-making?

Three Cost Trends

Three major trends are driving the shift from possible to scalable and enabling recommendation engines to scale technologically, economically and effectively.

1. The cost of data storage has come down. $500 buys a large volume of space on Amazon’s cloud. It’s the equivalent of what would have cost hundreds of thousands of dollars just 10 years ago. In short, big data also means cheap storage.

2. The cost of software has also come down dramatically. The software that enables companies to manage a large amount of data used to, and still does, run into the millions of dollars. However, thanks to decisions by several companies to open source their software, programs like Hadoop and Druid are monetarily free. While it certainly takes experts time to set it up and maintain, the overall cost of ownership has fallen. This abstraction has enabled smaller teams to tackle much bigger opportunities, like recommendation engines.

3. The cost of data has also come down. People themselves, in part driven by smartphone adoption, emit large volumes of storable data. Some of this data is very unstructured and dirty, comparable to call center data. Some is clean, like GPS location data. Moreover, it has become popular for start-ups to offer application programming interfaces (APIs). So not only is more data generated in more places, but it’s more available to more people in more places.

Falling barriers herald bold experimentation. These three trends intersect and cause a Cambrian explosion of experimentation and commercialization attempts.

Introducing the Data Scientist

Recommendation enginesAt the center of this is the data scientist. Data scientists turn data into product. A recommendation engine is certainly a product.

Data scientists combine a number of skills. They have to know how to write code. They know statistics and the algorithms used to extract patterns from nature. And they understand business. The combination of the three skills increases the likelihood that their solutions will scale successfully.

Data scientists come from any number of backgrounds. Some are highly accomplished computer scientists that got deeper into business and statistics. Some are from biomedical informatics. Many are from sectors with particularly high numeracy, such as physics. Others have roots in the management sciences. There are a number of ways a person can level up to become a data scientist, but it generally ends with a data scientist possessing competence in all three skill sets.

Data science begins with data. Nothing gets built without data. Data science continues with science. Accurate, persuasive and effective prediction requires patterns. The process of discovering that pattern is science. Any product worth building requires a reliable pattern to exist in the data.

The process of exploiting that pattern, especially for commercial gain, is engineering. Data science generally ends with engineering.

Many people who work with data scientists, who are responsible for various aspects of product building, are engineers. They may have data science in their titles. While they are likely to confuse human resource departments and leadership alike, the ambiguity is well worth the cost. The engineers are indispensable for translating the patterns found in nature directly into business outcomes. This ambiguity is a cause for concern among those concerned solely with labels.

The output of data science is product. As a result, product management is a major concern. Because their stance is based firmly in science and in iteration, data scientists frequently chose methods and tools that emphasize iteration and experimentation. Ideas such as fast-failure and continuous deployment are particularly well suited to this type of product development. When data scientists maintain their own product management and development teams, they chose continuous deployment, agile methods and rapid iteration.

The difference in stances is likely to cause cultural tension within organizations. The tension may generate positive spin-offs so long as it does not generate regrettable churn.

How Data Scientists Work

Data scientists spend time understanding the metric that needs to be maximized and the business context for that maximization. They call this the optimization objective and remain focused on a single one.

Like their cousins in operations, data scientists frequently have to gather data into one place. If there is none available, or if the odds of unlocking existing data are too remote, they have to generate their own source. Those from the natural sciences will gravitate toward setting up an experiment to get some data on which to train an engine.

Data scientists avoid writing as much of their own code as possible. They use open source libraries from Python, Octave and R before they resort to over-optimization. They will sooner use Amazon Mechanical Turk to obtain a larger data set than invent their own framework.

They will think about which methods are likely to scale, and they will try them out. They separate their data into a training set (in-sample set), a cross-validation set and a testing set (out-of-sample set). They will try to avoid over-fitting or under-fitting their algorithm to the data. They will seek a compromise between recall and precision.

They’ll expose their recommendation engine to the wild, observe how people react to it and then use that information to refine its accuracy. They will keep the engine out there as they gradually improve it. They’re rarely done optimizing both its scale and its performance, frequently seeking out additional data streams to use to improve it.


Recommendation engines put data immediately to work for the business and for consumers. The barriers to their use have come down and the opportunities for their deployment have improved. As more and more companies are discovering, they cost a small fraction of an average advertising campaign, bring in directly attributable revenue and deliver surprisingly short payback periods if done right.

Christopher Berry ( is the co-founder and chief science officer of Authintic (, an analytics technology company based in Toronto, Canada. Prior to Authintic, Berry built the measurement science and labs groups at Syncapse, a social media technology company, and the marketing science department at Critical Mass.

business analytics news and articles


Report: One in five cloud-based user accounts may be fake

According to the Q2 2018 DataVisor Fraud Index Report, more than one in five user accounts set up through cloud service providers may be fraudulent. The report, based on information gathered between April and June, analyzes 1.1 billion active user accounts, 1.5 million email domains, 231,000 device types and 562 cloud hosting providers and data centers, among other indicators. Read more →

When managers respond to online critics, more negative reviews ensue

A new study in the INFORMS journal Marketing Science found that when managers respond to online reviews it’s possible that those responses could actually stimulate additional reviewing activity and an increased number of negative reviews. The study, “Channels of Impact: User Reviews When Quality is Dynamic and Managers Respond,” is authored by Judith Chevalier of the Yale School of Management and NBER, Yaniv Dover of the Hebrew University of Jerusalem and Dina Mayzlin of the Marshal School of Business at the University of Southern California. Read more →

IE student designs software to optimize snow removal at Penn State

It is well known among the State College and Penn State communities that it takes a lot for university officials to shut the campus down after a major snowfall. In fact, since 2010, the University Park campus has been shut down just three full days due to snowfall. Much to the chagrin of students – and faculty and staff – the snow day at Penn State may just have become even more elusive, thanks to software developed by recent industrial engineering graduate Achal Goel. Read more →



Winter Simulation Conference
Dec. 9-12, 2018, Gothenburg, Sweden

INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Applied AI & Machine Learning | Comprehensive
Dec. 3, 2018 (live online)

Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to