Share with your friends


Analytics Magazine

Neuro-dynamic Programming: Building human curiosity into artificial intelligence

July/August 2015

How neuro-dynamic programming enables smart machines to think ahead.

Scott ZoldiBy Scott Zoldi

The media and watercooler chatter alike increasingly focus on how advances in machine learning and artificial intelligence (AI) are boosting the ability of predictive analytics to benefit businesses’ bottom lines. Some of that talk ponders the potential for smart machines to replace humans in higher-complexity jobs. No doubt, smart machines are getting smarter. But even the smartest machines still lack fundamental human characteristics that are absolutely critical to enabling people to solve problems. One of these key capabilities is curiosity – surely a computer can’t replicate that, can it?

Well, welcome to the evolving world of neuro-dynamic programming. It’s an analytic methodology for learning and anticipating how current and future actions are likely to contribute to a long-term cumulative reward. This technique is related to advanced AI reinforcement learning methods, which take inspiration from behaviorist psychology to attribute future reward/penalty back to earlier steps in a decision sequence, whereas traditional supervised learning attributes reward only to the current decision. These advanced methods focus on experimentation and prediction. They mimic the way the brain learns complex task sequences through pleasurable or painful feedback signals that may occur later in time – essentially, how humans seek and achieve long-term positive results.

Clearly, analytics that can “think” well ahead and focus on the most favorable outcomes are most welcome, since many operational decisions about customers have long-term consequences. High customer lifetime value and healthy, sustainable business cash flow are both produced by a series of interactions: the business takes an action, the customer reacts, the business responds to the new state of the relationship with another action, the customer reacts … and so on. In this way, neuro-dynamic programming enables smart machines to think ahead – potentially making moves early in the decision chain that do not appear optimal in the short run but in the view of the long-term future outcome represent better decisions.

Another way to think about this concept is to consider a group of dumb software agents (like individual ants). The agents interact with their environment and are rewarded or penalized around a small set of success criteria. Gradually “genes” of successful behavior emerge as the agents begin to map out the risk of various interrelated activities. Those agents with few successful genes receive a low “fitness” score and die out, whereas those with many successful genes score high and are allowed to reproduce, mutate or combine with other high-scoring agents. In this way, the overall performance of the group increases.

Because the environment is changing, these agents not only act in the optimal way based on their current best “map of the world,” they also experiment. Using probabilities, they make slight variations and mutate around the optimal strategy and associated genes, and as they receive rewards and penalties, learn from these experiments and adjust to a changing fitness landscape continually.

As you can see in Figure 1, at any point in the sequence, the current state of the customer relationship is the result not only of the just-taken action, but also of the string of previous actions. Just as in a chess game, where a checkmate could be rooted 10 moves back – or even in the first move – the loss of a valuable customer may have started with actions taken months ago. To be successful, a business needs to understand this dynamic.

current state of the customer relationship

Figure 2 depicts how these analytics learn about long-term effects by assigning credits for successful outcomes and penalties for unsuccessful ones. Although the action immediately before the outcome may receive a larger share of the credits or penalties, reinforcement learning distributes some amount of rewards/penalties across the entire sequence of actions.

current state of the customer relationship

During training with historical data, the model learns to associate value (total discounted rewards and penalties) with a customer state and with each of the potential actions the business can take at that point. After training, when presented with new data on a customer indicating a given state, the model is able to predict the long-term value of taking one action over another – and to select the best next action so as to maximize the long-term value.

To improve business actions at a fast pace, analytics must have a way to learn causal relationships (this change in action A causes outcome Y to change in this specific way, usually expressed in expectations because Y is uncertain) from data. To do so, the algorithm performs a controlled amount of deliberate experimentation. While customers in similar states with similar characteristics would normally be targeted with the same action according to deterministic rules, which would create targeting bias and with it, difficulty in identifying causal effects, advanced reinforcement-learning algorithms assign a small fraction of similar customers to somewhat different actions. In neuro-dynamic programming, these miniature experiments are essential for helping the neural networks – models that mimic the brain function to process a large number of inputs, utilizing high-speed computers and algorithms that learn to recognize complex patterns of behavior – to understand the causal effect relationships between states and actions, on state transition probabilities, and thus on customer value.

One way to think of this aspect of experimentation in neuro-dynamic programming is as an analytic implementation of “curiosity.” This inquisitive algorithm likes to test new actions with a component of randomness, see how the world responds, and adjust its concept of the world accordingly. It is actively collecting data we can think of as extra-informative. In other words, like humans, analytics can learn more from deliberate experimentation than from just passively observing the world and can do so in controlled and sensible ways.

Moreover, AI can go beyond just extracting information from given, business-as-usual data. It can actively generate new information. In principle, these technologies can apply this approach even when no historical data is available at the start, by directing the models to act and to learn on the fly from streaming production data based on expert knowledge of strategies and decision chains around customers.

Neuro-dynamic programming and related methods may be used to improve many areas of business operations – loan originations, customer management, marketing and collections, for example – where the standard practice is primarily single-shot decision modeling and optimization techniques that pinpoint the best next action, instead of modeling sequential decisions. Adding AI techniques, we can move beyond the immediate consequences of the next action and start reasoning about chains of actions and reactions leading to long-term results. For example, we might test these techniques to help determine which sequence of introductory rate, go-to rate, cross-sell offer and credit limit increase is optimal for maximizing a particular customer’s lifetime value, and evaluate course corrections over time.

The sequence of actions being optimized could take place over years or over the duration of a phone call. When a valuable customer calls to close her account, what is the optimal sequence of responses from the customer service agent for retaining the business? We may build an algorithm to tell us, and it might be able to improve over what a human agent would do based on her experience, the demeanors of the agent and customer, etc.

The number of potential customer states becomes enormous as you move through a sequence of actions (and despite experimentation, these include many states not yet encountered or for which little historical data exists). Neural networks can be used to predict the value of new customer states for possible actions that look similar to states and actions encountered in the past. Neural networks can also help in estimating the probability of a customer in one state transitioning to another state – and how business actions will affect that probability.

The combination of AI analytics and big data is exciting because it raises the possibility of greater and faster information extraction from large-scale data. As with any other analytic advancement, gains from big data AI will depend on smart analytic practices, such as high-quality, relevant data and expert analysts that guide model development and troubleshoot issues such as data bias. Still, AI analytics has the potential to open the perspectives and hypotheses of human experts to new possibilities, as well as overcome certain data limitations such as targeting bias.

All in all, big data AI is very good news for business. Companies should welcome these developments and move forward as opportunities arise. The organizations that will benefit the most will be those already using neural networks, self-learning models, experimental design and decision optimization. If you’re not working with these analytic techniques now, think about getting started for the big payoff ahead.

Scott Zoldi is a vice president of Analytic Science at FICO where he’s responsible for the analytic development of FICO’s transaction analytics products and solutions, including the FICO Falcon Fraud Manager product that protects about two-thirds of the world’s payment card transactions from fraud. He is recently focused on the applications of streaming self-learning analytics to allow for real-time detection of cyber security attack. He blogs at

business analytics news and articles


Related Posts

  • 53
    Market hype and growing interest in artificial intelligence (AI) are pushing established software vendors to introduce AI into their product strategy, creating considerable confusion in the process, according to Gartner, Inc. Analysts predict that by 2020, AI technologies will be virtually pervasive in almost every new software product and service.
    Tags: ai, learning, machine, intelligence, artificial
  • 52
    Ducati Corse, the racing department of Ducati Motor Holding, a world leader in sports motorcycle manufacturing, is working with Accenture to integrate the Internet of Things (IoT) and artificial intelligence technologies into the testing of its MotoGP racing bikes. Ducati Corse wants to make testing its race bikes faster, cheaper…
    Tags: data, will, learning, machine, intelligence, analytics, artificial
  • 50
    The winners of the first edition of the Dutch Data Science Awards, recently announced in Haarlem, included Quantib, data visualization expert Stef van den Elzen, AgroEnergy and CQM. AIMMS and AIMMS implementation partner ORTEC sponsored the awards along with EY, Lubbers De Jong, Microsoft and Motivaction. 
    Tags: data, techniques, big, intelligence, artificial, analytics
  • 44
    A quick quiz: What is a good nine- or 10-letter description of the emerging interest in business analytics and big data that ends in “-al”? A choice that may come to mind for many is “hysterical.” This choice reflects frenzied excitement about opportunities for business analytics to solve problems often…
    Tags: analytics, data, business, intelligence, big
  • 43
    Use of the term “business analytics” is being used within the information technology industry to refer to the use of computing to gain insight from data. The data may be obtained from a company’s internal sources, such as its enterprise resource planning application, data warehouses/marts, from a third party data…
    Tags: analytics, data, business, predictive

Analytics Blog

Electoral College put to the math test

With the campaign two months behind us and the inauguration of Donald Trump two days away, isn’t it time to put the 2016 U.S. presidential election to bed and focus on issues that have yet to be decided? Of course not.


Gaining distribution in small retail formats brings big payoffs

Small retail formats with limited assortments such as Save-A-Lot and Aldi and neighborhood stores like Target Express have been growing in popularity in the United States and around the world. For brands, the limited assortments mean greater competition for shelf-space, raising the question of whether it is worth expending marketing effort and slotting allowances to get on to their shelves. According to a forthcoming study in a leading INFORMS scholarly marketing journal, Marketing Science, the answer is “yes.” Read more →

Cognitive computing a disruptive force, but are CMOs ready?

While marketing and sales professionals increasingly find themselves drowning in data, a new IBM study finds that 64 percent of surveyed CMOs and sales leaders believe their industries will be ready to adopt cognitive technologies in the next three years. However, despite this stated readiness, the study finds that only 24 percent of those surveyed believe they have strategy in place to implement these technologies today. Read more →

How weather can impact consumer purchase response to mobile ads

Among the many factors that impact digital marketing and online advertising strategy, a new study in the INFORMS journal Marketing Science provides insight to a growing trend among firms and big brands: weather-based advertising. According to the study, certain weather conditions are more amenable for consumer responses to mobile marketing efforts, while the tone of the ad content can either help or hurt such response depending on the current local weather. Read more →



Essential Practice Skills for High-Impact Analytics Projects
Sept. 26-27, Executive Conference Center, Arlington, Va.

Foundations of Modern Predictive Analytics
Oct. 2-3, VT Executive Briefing Center, Arlington, Va.

2017 INFORMS Annual Meeting
October 22-25, 2017, Houston

2017 Winter Simulation Conference (WSC 2017)
Dec. 3-6, 2017, Las Vegas


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to