Neuro-dynamic Programming: Building human curiosity into artificial intelligence
How neuro-dynamic programming enables smart machines to think ahead.
By Scott Zoldi
The media and watercooler chatter alike increasingly focus on how advances in machine learning and artificial intelligence (AI) are boosting the ability of predictive analytics to benefit businesses’ bottom lines. Some of that talk ponders the potential for smart machines to replace humans in higher-complexity jobs. No doubt, smart machines are getting smarter. But even the smartest machines still lack fundamental human characteristics that are absolutely critical to enabling people to solve problems. One of these key capabilities is curiosity – surely a computer can’t replicate that, can it?
Well, welcome to the evolving world of neuro-dynamic programming. It’s an analytic methodology for learning and anticipating how current and future actions are likely to contribute to a long-term cumulative reward. This technique is related to advanced AI reinforcement learning methods, which take inspiration from behaviorist psychology to attribute future reward/penalty back to earlier steps in a decision sequence, whereas traditional supervised learning attributes reward only to the current decision. These advanced methods focus on experimentation and prediction. They mimic the way the brain learns complex task sequences through pleasurable or painful feedback signals that may occur later in time – essentially, how humans seek and achieve long-term positive results.
Clearly, analytics that can “think” well ahead and focus on the most favorable outcomes are most welcome, since many operational decisions about customers have long-term consequences. High customer lifetime value and healthy, sustainable business cash flow are both produced by a series of interactions: the business takes an action, the customer reacts, the business responds to the new state of the relationship with another action, the customer reacts … and so on. In this way, neuro-dynamic programming enables smart machines to think ahead – potentially making moves early in the decision chain that do not appear optimal in the short run but in the view of the long-term future outcome represent better decisions.
Another way to think about this concept is to consider a group of dumb software agents (like individual ants). The agents interact with their environment and are rewarded or penalized around a small set of success criteria. Gradually “genes” of successful behavior emerge as the agents begin to map out the risk of various interrelated activities. Those agents with few successful genes receive a low “fitness” score and die out, whereas those with many successful genes score high and are allowed to reproduce, mutate or combine with other high-scoring agents. In this way, the overall performance of the group increases.
Because the environment is changing, these agents not only act in the optimal way based on their current best “map of the world,” they also experiment. Using probabilities, they make slight variations and mutate around the optimal strategy and associated genes, and as they receive rewards and penalties, learn from these experiments and adjust to a changing fitness landscape continually.
As you can see in Figure 1, at any point in the sequence, the current state of the customer relationship is the result not only of the just-taken action, but also of the string of previous actions. Just as in a chess game, where a checkmate could be rooted 10 moves back – or even in the first move – the loss of a valuable customer may have started with actions taken months ago. To be successful, a business needs to understand this dynamic.
Figure 2 depicts how these analytics learn about long-term effects by assigning credits for successful outcomes and penalties for unsuccessful ones. Although the action immediately before the outcome may receive a larger share of the credits or penalties, reinforcement learning distributes some amount of rewards/penalties across the entire sequence of actions.
During training with historical data, the model learns to associate value (total discounted rewards and penalties) with a customer state and with each of the potential actions the business can take at that point. After training, when presented with new data on a customer indicating a given state, the model is able to predict the long-term value of taking one action over another – and to select the best next action so as to maximize the long-term value.
To improve business actions at a fast pace, analytics must have a way to learn causal relationships (this change in action A causes outcome Y to change in this specific way, usually expressed in expectations because Y is uncertain) from data. To do so, the algorithm performs a controlled amount of deliberate experimentation. While customers in similar states with similar characteristics would normally be targeted with the same action according to deterministic rules, which would create targeting bias and with it, difficulty in identifying causal effects, advanced reinforcement-learning algorithms assign a small fraction of similar customers to somewhat different actions. In neuro-dynamic programming, these miniature experiments are essential for helping the neural networks – models that mimic the brain function to process a large number of inputs, utilizing high-speed computers and algorithms that learn to recognize complex patterns of behavior – to understand the causal effect relationships between states and actions, on state transition probabilities, and thus on customer value.
One way to think of this aspect of experimentation in neuro-dynamic programming is as an analytic implementation of “curiosity.” This inquisitive algorithm likes to test new actions with a component of randomness, see how the world responds, and adjust its concept of the world accordingly. It is actively collecting data we can think of as extra-informative. In other words, like humans, analytics can learn more from deliberate experimentation than from just passively observing the world and can do so in controlled and sensible ways.
Moreover, AI can go beyond just extracting information from given, business-as-usual data. It can actively generate new information. In principle, these technologies can apply this approach even when no historical data is available at the start, by directing the models to act and to learn on the fly from streaming production data based on expert knowledge of strategies and decision chains around customers.
Neuro-dynamic programming and related methods may be used to improve many areas of business operations – loan originations, customer management, marketing and collections, for example – where the standard practice is primarily single-shot decision modeling and optimization techniques that pinpoint the best next action, instead of modeling sequential decisions. Adding AI techniques, we can move beyond the immediate consequences of the next action and start reasoning about chains of actions and reactions leading to long-term results. For example, we might test these techniques to help determine which sequence of introductory rate, go-to rate, cross-sell offer and credit limit increase is optimal for maximizing a particular customer’s lifetime value, and evaluate course corrections over time.
The sequence of actions being optimized could take place over years or over the duration of a phone call. When a valuable customer calls to close her account, what is the optimal sequence of responses from the customer service agent for retaining the business? We may build an algorithm to tell us, and it might be able to improve over what a human agent would do based on her experience, the demeanors of the agent and customer, etc.
The number of potential customer states becomes enormous as you move through a sequence of actions (and despite experimentation, these include many states not yet encountered or for which little historical data exists). Neural networks can be used to predict the value of new customer states for possible actions that look similar to states and actions encountered in the past. Neural networks can also help in estimating the probability of a customer in one state transitioning to another state – and how business actions will affect that probability.
The combination of AI analytics and big data is exciting because it raises the possibility of greater and faster information extraction from large-scale data. As with any other analytic advancement, gains from big data AI will depend on smart analytic practices, such as high-quality, relevant data and expert analysts that guide model development and troubleshoot issues such as data bias. Still, AI analytics has the potential to open the perspectives and hypotheses of human experts to new possibilities, as well as overcome certain data limitations such as targeting bias.
All in all, big data AI is very good news for business. Companies should welcome these developments and move forward as opportunities arise. The organizations that will benefit the most will be those already using neural networks, self-learning models, experimental design and decision optimization. If you’re not working with these analytic techniques now, think about getting started for the big payoff ahead.
Scott Zoldi is a vice president of Analytic Science at FICO where he’s responsible for the analytic development of FICO’s transaction analytics products and solutions, including the FICO Falcon Fraud Manager product that protects about two-thirds of the world’s payment card transactions from fraud. He is recently focused on the applications of streaming self-learning analytics to allow for real-time detection of cyber security attack. He blogs at http://www.fico.com/en/blogs/.
- 44A quick quiz: What is a good nine- or 10-letter description of the emerging interest in business analytics and big data that ends in “-al”? A choice that may come to mind for many is “hysterical.” This choice reflects frenzied excitement about opportunities for business analytics to solve problems often…
- 43Use of the term “business analytics” is being used within the information technology industry to refer to the use of computing to gain insight from data. The data may be obtained from a company’s internal sources, such as its enterprise resource planning application, data warehouses/marts, from a third party data…
- 39November/December 2014 Big data needs advanced analytics, but analytics does not need big data. By Eric A. King Thanks big data! Now we’re even more data-rich … yet remain information-poor. After staggering investments motivated by an overabundance of buzz and hype, big data has yet to produce cases that reveal…
- 38September/October 2012 How to make choices and investments that deliver on expectations. By Sally Taylor-Shoff (left) and Shalini Raghavan The most successful retailers today are increasing response rates to their offers and driving profitability by using Big Data and predictive analytics to make relevant, personalized, and precisely timed offers to…
- 38May/June 2012 Statistical modelers urged to embrace machine learning, open-source tools for the road ahead. By Sameer Chopra My thesis below addresses the following points: While statistical modeling is not going away, analytics groups are advised to leverage machine-learning approaches as well. While traditional statistical modeling software packages are not…