Online Marketing: Recommendation engines at work
Barriers to their deployment are coming down while opportunities for their deployment are improving.
By Christopher Berry
If you’ve used the streaming service of Netflix, bought something from Amazon or connected with “people you may know” on LinkedIn or Facebook, then you’ve used a recommendation engine. And chances are, you’ve watched more movies, bought more stuff and are less likely to churn because of them. These engines work.
Recommendation engines match features about you with things that you might be interested in.
For instance, a movie has a release year, a genre, actors and box office results. You have features. You have preferences, an age, and you may have completed a survey expressing some of your attitudes toward certain movies. You may have rated some of the movies you watched. By figuring out which sets of movies to show you, and your response to those recommendations, the machine learns over time to make better suggestions. If you watched a few science fiction movies, and you rated them highly, then the engine will learn to show you more science fiction movies, and, for variety, movies that other people like you, who like science fiction movies, might also enjoy.
More and more companies are using recommendation engines. Apple has its own engine to help consumers find apps they are likely to enjoy from Apple’s large inventory. Microsoft’s XBOX 360 Live has an engine to suggest new games you might be interested in based on what you’ve previously shown an interest in.
Many of the algorithms that are used in recommendation engines and machine learning aren’t all that new. Regression, decision trees, K-nearest neighbor (KNN), support vector machines (SVMs), neural networks and naive Bayes are established methods with well-known constraints and appropriate uses. Many of these methods have been used to support data-driven business decision-making for a long time. So, if the benefits of recommendation engines have been long known, what’s different now? What’s causing more companies to implement recommendation engines to support customer decision-making?
Three Cost Trends
Three major trends are driving the shift from possible to scalable and enabling recommendation engines to scale technologically, economically and effectively.
1. The cost of data storage has come down. $500 buys a large volume of space on Amazon’s cloud. It’s the equivalent of what would have cost hundreds of thousands of dollars just 10 years ago. In short, big data also means cheap storage.
2. The cost of software has also come down dramatically. The software that enables companies to manage a large amount of data used to, and still does, run into the millions of dollars. However, thanks to decisions by several companies to open source their software, programs like Hadoop and Druid are monetarily free. While it certainly takes experts time to set it up and maintain, the overall cost of ownership has fallen. This abstraction has enabled smaller teams to tackle much bigger opportunities, like recommendation engines.
3. The cost of data has also come down. People themselves, in part driven by smartphone adoption, emit large volumes of storable data. Some of this data is very unstructured and dirty, comparable to call center data. Some is clean, like GPS location data. Moreover, it has become popular for start-ups to offer application programming interfaces (APIs). So not only is more data generated in more places, but it’s more available to more people in more places.
Falling barriers herald bold experimentation. These three trends intersect and cause a Cambrian explosion of experimentation and commercialization attempts.
Introducing the Data Scientist
At the center of this is the data scientist. Data scientists turn data into product. A recommendation engine is certainly a product.
Data scientists combine a number of skills. They have to know how to write code. They know statistics and the algorithms used to extract patterns from nature. And they understand business. The combination of the three skills increases the likelihood that their solutions will scale successfully.
Data scientists come from any number of backgrounds. Some are highly accomplished computer scientists that got deeper into business and statistics. Some are from biomedical informatics. Many are from sectors with particularly high numeracy, such as physics. Others have roots in the management sciences. There are a number of ways a person can level up to become a data scientist, but it generally ends with a data scientist possessing competence in all three skill sets.
Data science begins with data. Nothing gets built without data. Data science continues with science. Accurate, persuasive and effective prediction requires patterns. The process of discovering that pattern is science. Any product worth building requires a reliable pattern to exist in the data.
The process of exploiting that pattern, especially for commercial gain, is engineering. Data science generally ends with engineering.
Many people who work with data scientists, who are responsible for various aspects of product building, are engineers. They may have data science in their titles. While they are likely to confuse human resource departments and leadership alike, the ambiguity is well worth the cost. The engineers are indispensable for translating the patterns found in nature directly into business outcomes. This ambiguity is a cause for concern among those concerned solely with labels.
The output of data science is product. As a result, product management is a major concern. Because their stance is based firmly in science and in iteration, data scientists frequently chose methods and tools that emphasize iteration and experimentation. Ideas such as fast-failure and continuous deployment are particularly well suited to this type of product development. When data scientists maintain their own product management and development teams, they chose continuous deployment, agile methods and rapid iteration.
The difference in stances is likely to cause cultural tension within organizations. The tension may generate positive spin-offs so long as it does not generate regrettable churn.
How Data Scientists Work
Data scientists spend time understanding the metric that needs to be maximized and the business context for that maximization. They call this the optimization objective and remain focused on a single one.
Like their cousins in operations, data scientists frequently have to gather data into one place. If there is none available, or if the odds of unlocking existing data are too remote, they have to generate their own source. Those from the natural sciences will gravitate toward setting up an experiment to get some data on which to train an engine.
Data scientists avoid writing as much of their own code as possible. They use open source libraries from Python, Octave and R before they resort to over-optimization. They will sooner use Amazon Mechanical Turk to obtain a larger data set than invent their own framework.
They will think about which methods are likely to scale, and they will try them out. They separate their data into a training set (in-sample set), a cross-validation set and a testing set (out-of-sample set). They will try to avoid over-fitting or under-fitting their algorithm to the data. They will seek a compromise between recall and precision.
They’ll expose their recommendation engine to the wild, observe how people react to it and then use that information to refine its accuracy. They will keep the engine out there as they gradually improve it. They’re rarely done optimizing both its scale and its performance, frequently seeking out additional data streams to use to improve it.
Recommendation engines put data immediately to work for the business and for consumers. The barriers to their use have come down and the opportunities for their deployment have improved. As more and more companies are discovering, they cost a small fraction of an average advertising campaign, bring in directly attributable revenue and deliver surprisingly short payback periods if done right.
Christopher Berry (firstname.lastname@example.org) is the co-founder and chief science officer of Authintic (www.authintic.com), an analytics technology company based in Toronto, Canada. Prior to Authintic, Berry built the measurement science and labs groups at Syncapse, a social media technology company, and the marketing science department at Critical Mass.