Share with your friends


Analytics Magazine

Viewpoint: The importance of confidence scores

Emma DuckworthBy Emma Duckworth

How confident are you about your modeled data?

If you reply honestly, the answer to the question is likely to be akin to sticking your finger in the air and seeing which way the wind is blowing. The problem with modeled data is its very nature – it’s modeled. Therefore, errors and inaccuracies can creep in, making it at best useless, and, at worst, a dangerous tool in business decision-making. That is why confidence scores are crucial to today’s modeled data attributes.

In order to trust and use data science and modeled data, both the science and the data need to be transparent and explainable. If brands are to make important decisions around pricing, qualification, risk and more using data science, they have to be able to understand how models came to the scores they have and how accurate the models themselves are. It is vital for communicating with customers and regulators alike.

Let’s take the insurance industry as an example. Confidence scored data gives autonomy to insurers to create their own thresholds when making nuanced judgements around pricing or the customer journey. Companies can decide themselves between a more disruptive but thorough customer journey or automated form fill when creating policies. Specialty services can tailor models to these variables with full transparency into the quality of the data and the risk they are facing.

Creating confidence scores can often be just as complex as creating your predictive model. Source: ThinkStock

Creating confidence scores can often be just as complex as creating your predictive model.
Source: ThinkStock

However, there are two main problems in creating accurate confidence scores on modeled insurance data. The first is when there isn’t very much training data available. The second is when there is an abundance of training data available, but it is skewed or not representative of the data to be predicted. If this is the case, there is a significant risk that the model will produce high confidence scores for inaccurate predictions because the scoring population is inconsistent with the training population. It’s like creating a model to identify oranges and using it to predict apples. That the model has good confidence in its ability to predict oranges simply isn’t applicable.

To mitigate the risk of small training data, a good usage of statistical methods/approaches/tests (and distribution assumptions) to select upper and lower confidences reflective of volatile data is key. However, the solution to the second issue is more complex than it might seem at first. To combat it, it is crucial to create a process that ensures the test data is representative of the training data and vice versa. In recent times the flood of data has removed the need to be strict with confidence scores and boundaries, however when modelling on skewed data this discipline is still imperative. Training and test data must be collaborated to remove bias.

Given these challenges, creating confidence scores can often be just as complex as creating your predictive model. It requires judgment, statistics and experience. Moreover, accurate confidence scores are vital when providing data that will underpin business processes and an important part of building trust both with consumers and regulators.

Emma Duckworth is the lead data scientist at Outra.

Analytics data science news articles

Related Posts

  • 82
    With the rise of big data – and the processes and tools related to utilizing and managing large data sets – organizations are recognizing the value of data as a critical business asset to identify trends, patterns and preferences to drive improved customer experiences and competitive advantage. The problem is,…
    Tags: data
  • 79
    The Internet of Things (IoT) is considered to be the next revolution that touches every part of our daily life, from restocking ice cream to warning of pollutants. Analytics professionals understand the importance of data, especially in a complicated field such as healthcare. This article offers a framework on integrating…
    Tags: data
  • 73
    Businesses are greatly expanding the autonomous capabilities of their products, services and manufacturing processes to better optimize their reliability and efficiency. The processing of big data is playing an integral role in developing these prescriptive analytics. As a result, data scientists and engineers should pay attention to the following aspects…
    Tags: data
  • 70
    Frontline Systems releases Analytic Solver V2018 for Excel Frontline Systems, developer of the Solver in Microsoft Excel, recently released Analytic Solver V2018, its full product line of predictive and prescriptive analytics tools that work in Microsoft Excel. The new release includes a visual editor for multi-stage “data science workflows” (also…
    Tags: data
  • 68
    Today, we live in a digital society. Our distinct footprints are in every interaction we make. Data generation is a default – be it from enterprise operational systems, logs from web servers, other applications, social interactions and transactions, research initiatives and connected things (Internet of Things). In fact, according to…
    Tags: data


Fighting terrorists online: Identifying extremists before they post content

New research has found a way to identify extremists, such as those associated with the terrorist group ISIS, by monitoring their social media accounts, and can identify them even before they post threatening content. The research, “Finding Extremists in Online Social Networks,” which was recently published in the INFORMS journal Operations Research, was conducted by Tauhid Zaman of the MIT, Lt. Col. Christopher E. Marks of the U.S. Army and Jytte Klausen of Brandeis University. Read more →

Syrian conflict yields model for attrition dynamics in multilateral war

Based on their study of the Syrian Civil War that’s been raging since 2011, three researchers created a predictive model for multilateral war called the Lanchester multiduel. Unless there is a player so strong it can guarantee a win regardless of what others do, the likely outcome of multilateral war is a gradual stalemate that culminates in the mutual annihilation of all players, according to the model. Read more →

SAS, Samford University team up to generate sports analytics talent

Sports teams try to squeeze out every last bit of talent to gain a competitive advantage on the field. That’s also true in college athletic departments and professional team offices, where entire departments devoted to analyzing data hunt for sports analytics experts that can give them an edge in a game, in the stands and beyond. To create this talent, analytics company SAS will collaborate with the Samford University Center for Sports Analytics to support teaching, learning and research in all areas where analytics affects sports, including fan engagement, sponsorship, player tracking, sports medicine, sports media and operations. Read more →



INFORMS Annual Meeting
Nov. 4-7, 2018, Phoenix

Winter Simulation Conference
Dec. 9-12, 2018, Gothenburg, Sweden


Applied AI & Machine Learning | Comprehensive
Starts Oct. 29, 2018 (live online)

The Analytics Clinic
Citizen Data Scientists | Why Not DIY AI?
Nov. 8, 2018, 11 a.m. – 12:30 p.m.

Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to