Share with your friends


Analytics Magazine

Viewpoint: The importance of confidence scores

Emma DuckworthBy Emma Duckworth

How confident are you about your modeled data?

If you reply honestly, the answer to the question is likely to be akin to sticking your finger in the air and seeing which way the wind is blowing. The problem with modeled data is its very nature – it’s modeled. Therefore, errors and inaccuracies can creep in, making it at best useless, and, at worst, a dangerous tool in business decision-making. That is why confidence scores are crucial to today’s modeled data attributes.

In order to trust and use data science and modeled data, both the science and the data need to be transparent and explainable. If brands are to make important decisions around pricing, qualification, risk and more using data science, they have to be able to understand how models came to the scores they have and how accurate the models themselves are. It is vital for communicating with customers and regulators alike.

Let’s take the insurance industry as an example. Confidence scored data gives autonomy to insurers to create their own thresholds when making nuanced judgements around pricing or the customer journey. Companies can decide themselves between a more disruptive but thorough customer journey or automated form fill when creating policies. Specialty services can tailor models to these variables with full transparency into the quality of the data and the risk they are facing.

Creating confidence scores can often be just as complex as creating your predictive model. Source: ThinkStock

Creating confidence scores can often be just as complex as creating your predictive model.
Source: ThinkStock

However, there are two main problems in creating accurate confidence scores on modeled insurance data. The first is when there isn’t very much training data available. The second is when there is an abundance of training data available, but it is skewed or not representative of the data to be predicted. If this is the case, there is a significant risk that the model will produce high confidence scores for inaccurate predictions because the scoring population is inconsistent with the training population. It’s like creating a model to identify oranges and using it to predict apples. That the model has good confidence in its ability to predict oranges simply isn’t applicable.

To mitigate the risk of small training data, a good usage of statistical methods/approaches/tests (and distribution assumptions) to select upper and lower confidences reflective of volatile data is key. However, the solution to the second issue is more complex than it might seem at first. To combat it, it is crucial to create a process that ensures the test data is representative of the training data and vice versa. In recent times the flood of data has removed the need to be strict with confidence scores and boundaries, however when modelling on skewed data this discipline is still imperative. Training and test data must be collaborated to remove bias.

Given these challenges, creating confidence scores can often be just as complex as creating your predictive model. It requires judgment, statistics and experience. Moreover, accurate confidence scores are vital when providing data that will underpin business processes and an important part of building trust both with consumers and regulators.

Emma Duckworth is the lead data scientist at Outra.

Analytics data science news articles

Related Posts

  • 82
    With the rise of big data – and the processes and tools related to utilizing and managing large data sets – organizations are recognizing the value of data as a critical business asset to identify trends, patterns and preferences to drive improved customer experiences and competitive advantage. The problem is,…
    Tags: data
  • 79
    The Internet of Things (IoT) is considered to be the next revolution that touches every part of our daily life, from restocking ice cream to warning of pollutants. Analytics professionals understand the importance of data, especially in a complicated field such as healthcare. This article offers a framework on integrating…
    Tags: data
  • 73
    Businesses are greatly expanding the autonomous capabilities of their products, services and manufacturing processes to better optimize their reliability and efficiency. The processing of big data is playing an integral role in developing these prescriptive analytics. As a result, data scientists and engineers should pay attention to the following aspects…
    Tags: data
  • 70
    Frontline Systems releases Analytic Solver V2018 for Excel Frontline Systems, developer of the Solver in Microsoft Excel, recently released Analytic Solver V2018, its full product line of predictive and prescriptive analytics tools that work in Microsoft Excel. The new release includes a visual editor for multi-stage “data science workflows” (also…
    Tags: data
  • 68
    Today, we live in a digital society. Our distinct footprints are in every interaction we make. Data generation is a default – be it from enterprise operational systems, logs from web servers, other applications, social interactions and transactions, research initiatives and connected things (Internet of Things). In fact, according to…
    Tags: data


Former INFORMS President Cook named to U.S. Census committee

Tom Cook, a former president of INFORMS, a founding partner of Decision Analytics International and a member of the National Academy of Engineering, was recently named one of five new members of the U.S. Census Bureau’s Census Scientific Advisory Committee (CSAC). The committee meets twice a year to address policy, research and technical issues relating to a full range of Census Bureau programs and activities, including census tests, policies and operations. The CSAC will meet for its fall 2018 meeting at Census Bureau headquarters in Suitland, Md., Sept. 13-14. Read more →

Gartner identifies six barriers to becoming a digital business

As organizations continue to embrace digital transformation, they are finding that digital business is not as simple as buying the latest technology – it requires significant changes to culture and systems. A recent Gartner, Inc. survey found that only a small number of organizations have been able to successfully scale their digital initiatives beyond the experimentation and piloting stages. “The reality is that digital business demands different skills, working practices, organizational models and even cultures,” says Marcus Blosch, research vice president at Gartner. Read more →

Innovation and speculation drive stock market bubble activity

A group of data scientists conducted an in-depth analysis of major innovations and stock market bubbles from 1825 through 2000 and came away with novel takeaways of their own as they found some very distinctive patterns in the occurrence of bubbles over 175 years. The study authors detected bubbles in approximately 73 percent of the innovations they studied, revealing the close relationship between innovation and stock market bubbles. Read more →



INFORMS Annual Meeting
Nov. 4-7, 2018, Phoenix

Winter Simulation Conference
Dec. 9-12, 2018, Gothenburg, Sweden


Applied AI & Machine Learning | Comprehensive
Sept. 10-13, 17-20 and 24-25

Advancing the Analytics-Driven Organization
Sept. 17-20, 12-5 p.m. LIVE Online

The Analytics Clinic: Ensemble Models: Worth the Gains?
Sept. 20, 11 a.m.-12:30 p.m.

Predictive Analytics: Failure to Launch Webinar
Oct. 3, 11 a.m.

Advancing the Analytics-Driven Organization
Oct. 1-4, 12 p.m.-5 p.m.

Applied AI & Machine Learning | Comprehensive
Oct. 15-19, Washington, D.C.

Making Data Science Pay
Oct. 29 -30, 12 p.m.-5 p.m.


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to