Share with your friends


Analytics Magazine

Neuro-dynamic Programming: Building human curiosity into artificial intelligence

July/August 2015

How neuro-dynamic programming enables smart machines to think ahead.

Scott ZoldiBy Scott Zoldi

The media and watercooler chatter alike increasingly focus on how advances in machine learning and artificial intelligence (AI) are boosting the ability of predictive analytics to benefit businesses’ bottom lines. Some of that talk ponders the potential for smart machines to replace humans in higher-complexity jobs. No doubt, smart machines are getting smarter. But even the smartest machines still lack fundamental human characteristics that are absolutely critical to enabling people to solve problems. One of these key capabilities is curiosity – surely a computer can’t replicate that, can it?

Well, welcome to the evolving world of neuro-dynamic programming. It’s an analytic methodology for learning and anticipating how current and future actions are likely to contribute to a long-term cumulative reward. This technique is related to advanced AI reinforcement learning methods, which take inspiration from behaviorist psychology to attribute future reward/penalty back to earlier steps in a decision sequence, whereas traditional supervised learning attributes reward only to the current decision. These advanced methods focus on experimentation and prediction. They mimic the way the brain learns complex task sequences through pleasurable or painful feedback signals that may occur later in time – essentially, how humans seek and achieve long-term positive results.

Clearly, analytics that can “think” well ahead and focus on the most favorable outcomes are most welcome, since many operational decisions about customers have long-term consequences. High customer lifetime value and healthy, sustainable business cash flow are both produced by a series of interactions: the business takes an action, the customer reacts, the business responds to the new state of the relationship with another action, the customer reacts … and so on. In this way, neuro-dynamic programming enables smart machines to think ahead – potentially making moves early in the decision chain that do not appear optimal in the short run but in the view of the long-term future outcome represent better decisions.

Another way to think about this concept is to consider a group of dumb software agents (like individual ants). The agents interact with their environment and are rewarded or penalized around a small set of success criteria. Gradually “genes” of successful behavior emerge as the agents begin to map out the risk of various interrelated activities. Those agents with few successful genes receive a low “fitness” score and die out, whereas those with many successful genes score high and are allowed to reproduce, mutate or combine with other high-scoring agents. In this way, the overall performance of the group increases.

Because the environment is changing, these agents not only act in the optimal way based on their current best “map of the world,” they also experiment. Using probabilities, they make slight variations and mutate around the optimal strategy and associated genes, and as they receive rewards and penalties, learn from these experiments and adjust to a changing fitness landscape continually.

As you can see in Figure 1, at any point in the sequence, the current state of the customer relationship is the result not only of the just-taken action, but also of the string of previous actions. Just as in a chess game, where a checkmate could be rooted 10 moves back – or even in the first move – the loss of a valuable customer may have started with actions taken months ago. To be successful, a business needs to understand this dynamic.

current state of the customer relationship

Figure 2 depicts how these analytics learn about long-term effects by assigning credits for successful outcomes and penalties for unsuccessful ones. Although the action immediately before the outcome may receive a larger share of the credits or penalties, reinforcement learning distributes some amount of rewards/penalties across the entire sequence of actions.

current state of the customer relationship

During training with historical data, the model learns to associate value (total discounted rewards and penalties) with a customer state and with each of the potential actions the business can take at that point. After training, when presented with new data on a customer indicating a given state, the model is able to predict the long-term value of taking one action over another – and to select the best next action so as to maximize the long-term value.

To improve business actions at a fast pace, analytics must have a way to learn causal relationships (this change in action A causes outcome Y to change in this specific way, usually expressed in expectations because Y is uncertain) from data. To do so, the algorithm performs a controlled amount of deliberate experimentation. While customers in similar states with similar characteristics would normally be targeted with the same action according to deterministic rules, which would create targeting bias and with it, difficulty in identifying causal effects, advanced reinforcement-learning algorithms assign a small fraction of similar customers to somewhat different actions. In neuro-dynamic programming, these miniature experiments are essential for helping the neural networks – models that mimic the brain function to process a large number of inputs, utilizing high-speed computers and algorithms that learn to recognize complex patterns of behavior – to understand the causal effect relationships between states and actions, on state transition probabilities, and thus on customer value.

One way to think of this aspect of experimentation in neuro-dynamic programming is as an analytic implementation of “curiosity.” This inquisitive algorithm likes to test new actions with a component of randomness, see how the world responds, and adjust its concept of the world accordingly. It is actively collecting data we can think of as extra-informative. In other words, like humans, analytics can learn more from deliberate experimentation than from just passively observing the world and can do so in controlled and sensible ways.

Moreover, AI can go beyond just extracting information from given, business-as-usual data. It can actively generate new information. In principle, these technologies can apply this approach even when no historical data is available at the start, by directing the models to act and to learn on the fly from streaming production data based on expert knowledge of strategies and decision chains around customers.

Neuro-dynamic programming and related methods may be used to improve many areas of business operations – loan originations, customer management, marketing and collections, for example – where the standard practice is primarily single-shot decision modeling and optimization techniques that pinpoint the best next action, instead of modeling sequential decisions. Adding AI techniques, we can move beyond the immediate consequences of the next action and start reasoning about chains of actions and reactions leading to long-term results. For example, we might test these techniques to help determine which sequence of introductory rate, go-to rate, cross-sell offer and credit limit increase is optimal for maximizing a particular customer’s lifetime value, and evaluate course corrections over time.

The sequence of actions being optimized could take place over years or over the duration of a phone call. When a valuable customer calls to close her account, what is the optimal sequence of responses from the customer service agent for retaining the business? We may build an algorithm to tell us, and it might be able to improve over what a human agent would do based on her experience, the demeanors of the agent and customer, etc.

The number of potential customer states becomes enormous as you move through a sequence of actions (and despite experimentation, these include many states not yet encountered or for which little historical data exists). Neural networks can be used to predict the value of new customer states for possible actions that look similar to states and actions encountered in the past. Neural networks can also help in estimating the probability of a customer in one state transitioning to another state – and how business actions will affect that probability.

The combination of AI analytics and big data is exciting because it raises the possibility of greater and faster information extraction from large-scale data. As with any other analytic advancement, gains from big data AI will depend on smart analytic practices, such as high-quality, relevant data and expert analysts that guide model development and troubleshoot issues such as data bias. Still, AI analytics has the potential to open the perspectives and hypotheses of human experts to new possibilities, as well as overcome certain data limitations such as targeting bias.

All in all, big data AI is very good news for business. Companies should welcome these developments and move forward as opportunities arise. The organizations that will benefit the most will be those already using neural networks, self-learning models, experimental design and decision optimization. If you’re not working with these analytic techniques now, think about getting started for the big payoff ahead.

Scott Zoldi is a vice president of Analytic Science at FICO where he’s responsible for the analytic development of FICO’s transaction analytics products and solutions, including the FICO Falcon Fraud Manager product that protects about two-thirds of the world’s payment card transactions from fraud. He is recently focused on the applications of streaming self-learning analytics to allow for real-time detection of cyber security attack. He blogs at

business analytics news and articles


Related Posts

  • 57
    Car dashboards are simple visual indicators of a complex machine with many parts that performs a high-stakes task in a context of many overlapping, conflicting rules and goals: personal convenience, safety, minimum travel time, courtesy to other drivers and so on. The speedometer, perhaps one of the most important indicators,…
    Tags: business, data, analytics, will, machine, intelligence, big, artificial
  • 53
    Market hype and growing interest in artificial intelligence (AI) are pushing established software vendors to introduce AI into their product strategy, creating considerable confusion in the process, according to Gartner, Inc. Analysts predict that by 2020, AI technologies will be virtually pervasive in almost every new software product and service.
    Tags: ai, learning, machine, intelligence, artificial
  • 52
    Ducati Corse, the racing department of Ducati Motor Holding, a world leader in sports motorcycle manufacturing, is working with Accenture to integrate the Internet of Things (IoT) and artificial intelligence technologies into the testing of its MotoGP racing bikes. Ducati Corse wants to make testing its race bikes faster, cheaper…
    Tags: data, will, learning, machine, intelligence, analytics, artificial
  • 50
    The winners of the first edition of the Dutch Data Science Awards, recently announced in Haarlem, included Quantib, data visualization expert Stef van den Elzen, AgroEnergy and CQM. AIMMS and AIMMS implementation partner ORTEC sponsored the awards along with EY, Lubbers De Jong, Microsoft and Motivaction. 
    Tags: data, techniques, big, intelligence, artificial, analytics
  • 48
    Just when I thought that artificial intelligence and machine learning were the latest, hottest, highest-flying topics in the analytics orbit (January/February Inside Story), along comes a shooting star called “customer success management,” or simply CSM to those in the know. “The CSM field is exploding,” Vijay Mehrotra schooled me in…
    Tags: analytics, customer, intelligence, artificial, learning, machine


Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a $320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →



INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to