Share with your friends


Analytics Magazine

Health data things: monetizing IoT & health apps

Aaron LaiBy Aaron Lai

Time present and time past
Are both perhaps present in time future
And time future contained in time past.
– “Four Quartets,” T.S. Eliot

The Internet of Things (IoT) is considered to be the next revolution that touches every part of our daily life, from restocking ice cream to warning of pollutants. Analytics professionals understand the importance of data, especially in a complicated field such as healthcare. This article offers a framework on integrating different data sources, as well as a way to unleash the full potential of data to estimate customer lifetime value (CLV). The ultimate goal – monetizing the value of “data as data” – is one of the few things that work against the Law of Diminishing Marginal Return. We’ll illustrate the concept with a real-time biometrics monitoring device and associated mobile phone apps.

Big Little Data: Variety, Velocity and Volume

Big data is generally understood as the 3Vs: variety (different data nature), velocity (rapid arrival of data) and volume (massive quantity of data). Healthcare is a venue for big data, given the volume of data from EHRs (electronic health records) and genomic data. Healthcare also has very rapid data; a pacemaker can generate heartbeat data in real time. Meanwhile, machine-generated data, user-inputted data (e.g., patient-reported outcomes) and observation data (e.g., prescription note) add considerable variety to healthcare data.

Big data provides big opportunities for the healthcare industry. Previous efforts in terms of health devices and health apps (for mobile phones) are mostly focused on data collected via those machines, as well as the business model that could be modified based on data from these and other data sources, both internal and external.

Types of Data Elements

Data sources can be grouped by their nature.

Membership data. This includes user information, service utilization (e.g., call center), user-reported data (e.g., patient-reported outcome such as mood), system-recorded data (e.g., login IP, mobile model) and other information collected from or supplied by the users. Other than restrictions imposed by regulations, the company is free to use this data.

Physically recorded data. This refers to the data collected by devices including biometrics, heartbeat, blood pressure, blood glucose, physical movement, etc. Depending on the device, some data is highly accurate (e.g., blood glucose) and some data is just “directionally correct” (e.g., sleep tracker). It is important not to over-interpret less reliable data.

Third-party data. Many “data brokers” provide individual or household-level data for further analysis or marketing purposes. A typical data set could easily reach thousands of attributes and millions of records. This data could be used for prospect acquisition and data enrichment.

Clinical data. This data usually resides in the EHR system hosted by the payer (i.e., insurance company), the provider (i.e., hospital) or the pharmacist. With the permission of the patients, the company can combine the data to uncover previously unknown relationships or to serve the users better. For example, suppose someone has a cardiovascular disease and is taking a particular type of drug. The activity tracker could monitor and report her physical activity level, while a patient-reported outcome app could account for mood change. It would then be possible to correlate the impact of the drug on the level of her physical activities, as well as its effect on her mood.

Competitive data. Many vendors sell a wide variety of competitive or market-related data. For example, some sell anonymized prescription data while others sell market insights. The data comes from industry survey results and other means.

Public data. The most obvious such source is census data. Other government agencies release data such as hospital discharge rates, disease (Centers for Disease Control and Prevention) and physician data (Centers for Medicare & Medicaid Services). Combining all this data provides clinical insight, behavioral insight and technical insight that can be fed into the R&D, diagnosis and prescription work. Eventually, this process can act as a feedback loop to the user database.

Figure 1: Integrated data flow.

Figure 1: Integrated data flow.

Circle of Levels: Multi-Level Modeling

In statistics, multi-level modeling is used to estimate the parameters of a model if the explanatory variables have both individual and group data. In the following example, we use the same concept but in a slightly different context.

Suppose we have all the data directly collected from the devices and apps as described above for a particular person. At that point, we know the user, and we can add third-party data and clinical data, providing a “full” information set for that individual.

The next step is to integrate the anonymized data such as third-party prescription data, third-party procedure data, etc. The true identity of the person is not known since some of the demographics have been removed or masked due to privacy concerns. However, we could then use a simple Bayesian approach to estimate the probability that this is a similar enough match for analytics purposes. It would be a group-based match, i.e., the likelihood that this known user will share the behavior of this type of anonymous people. It can be written as:

Probability (this user shares the same characteristics of this group of anonymous people) = Probability (this user shares the same characteristics of this group of anonymous people given the pre-defined characteristics) * Probability (pre-defined characteristics)

For example, you may have some users who take a specific type of high blood pressure drug and a specific type of diabetes drug for a certain period of time. You may then be able to extract a population from the third-party prescription database based on those two characteristics. You can then estimate the likelihood that those two groups are the same and set a subjective confidence matching cut-off. It should be noted that the final likelihood also depends on the probability of occurrence. Since this is a binary outcome, you can estimate it with logistic regression or any other techniques such as decision trees or support vector machine. Once you are satisfied with the formula, you can then assign those known users with additional information obtained from the prescription database.

Another type of data is aggregate data. Much of government data or competitive data is in aggregated form, usually geographically based (e.g., county level). We can follow the aforementioned approach to estimate the likelihood that a group of known users should share the same information as another group from another data source. Of course, we are not saying that everyone in Orange County, Calif., is the same. But we could argue that users in Orange County will resemble those in Orange County more than they will resemble those in Cook County, Ill. The same Bayesian estimation could be used.

Estimating the Customer Lifetime Value

The journey of a customer can be illustrated with the diagram shown in Figure 2.

Figure 2: Customer lifecycle.

Figure 2: Customer lifecycle.

The customer lifecycle starts with a prospect, someone who will potentially use your product. Once a prospect contacts you, she becomes a lead. If she has bought (or used) your product, she is now a customer. She will then experience your product and services, and those customer experiences will affect her likelihood to continue. When the renewal moment comes up, she could follow the attrition path to leave or the renewal path to stay. A former customer could also be won back and become a customer again.

Companies often want to know how much they should spend to acquire a customer. One approach is to calculate the customer lifetime value (CLV). If the cost per acquisition (CPA) is lower than the CLV, then this is a positive investment. An equivalent way is to calculate the return on investment (ROI), which is defined as CLV/CPA-1. A positive ROI means positive investment.

However, estimating CLV is not a simple task. As we can see from Figure 2, a customer can follow many paths, and many people use the average tenure as a shortcut. Here we propose a more systematic and data-driven approach to CLV estimation.

Figure 3: Sample customer lifecycle process.

Figure 3: Sample customer lifecycle process.

If we look closer at the customer lifecycle diagram, it is obvious that each path could be considered in a probabilistic way. For example, the transition from prospect to lead is governed by the probability that a person will respond to a solicitation. In other words, this is the response rate of a marketing campaign.

We can look at this problem using a Markov Chain approach. Table 1 is a sample transition matrix, i.e., the probability matrix that shows how people could move from one stage to another. For the most accurate results, it should be person-specific and time-specific. To estimate the equilibrium (or long-term) transition probabilities, we will multiply the transition matrixes together. Therefore, any errors will be amplified through multiplications.

Table 1: Sample transition matrix for a certain person at a certain time.

Table 1: Sample transition matrix for a certain person at a certain time.

The CLV will then be the total revenue and cost at each stage for each person over his or her complete tenure. The value of the additional data that is collected could be estimated by how the improved accuracy will increase CLV. You can also perform if-then analysis using this framework to estimate the value of a new feature for the medical device.

Suppose your medical device can give a five-minute warning for an imminent heart attack. Then you can calculate the value of this feature via a longer tenure (since the patient will live longer) or potential revenue from higher reimbursement from insurance companies (the payers) or higher sales. The difference between this new value and the status quo will be the value of this feature. This can also be used to estimate the fair price in the value-based contract or value-based reimbursement.

Another way to look at it is from a real option framework. The availability of that data enables us to do things. If you neither collect nor acquire the data, you can’t have that feature, even though you might not be able to make this feature work. The cost of the data is the option premium you pay to have a chance to do something else. This is also related to how you can monetarize the data.

For instance, probably few people will be interested in buying the demographics of your user profiles. However, many people will be interested in buying a lead list with people who are physically active and are taking a particular class of drug. As a result, your user list is suddenly worth a lot more to advertisers.

By the same token, insurance companies may not be willing to pay for your medical device as they are skeptical about the value you added. But if you can prove that your device can monitor patients’ health properly and be able to get them to take proper action to avoid adverse and costly issues, e.g., hospital admission, then you can build up a better case to ask for payment or reimbursement. This is the idea behind real-world outcome research.

Conclusion: We Must Part Now

Data is the beginning and the end. The IoT provides unprecedented big data and big opportunities for those who can appreciate it. We have explained what data one could and should collect, illustrated the issues and techniques in integrating those data, described the customer lifetime value (CLV) calculation, and extended the framework to accommodate strategy development. The past and future are both fixed on the present; one has to seize the opportunities presented to unleash the value of data and unleash the monetary value of data.

Aaron Lai ( is the senior manager of analytics for Blue Shield of California. He also serves on the Advisory Board of Business Information and Analytics Center of Purdue University. He has bachelor degrees in finance from City University of Hong Kong and management studies from the University of London, a MBA from Purdue and master’s degrees from Oxford in sociology and evidence-based healthcare. All opinions expressed in this article are his own and do not necessarily reflect those of his employer or his affiliations.

Analytics data science news articles

Related Posts

  • 100
    Businesses are greatly expanding the autonomous capabilities of their products, services and manufacturing processes to better optimize their reliability and efficiency. The processing of big data is playing an integral role in developing these prescriptive analytics. As a result, data scientists and engineers should pay attention to the following aspects…
    Tags: data
  • 100
    Frontline Systems releases Analytic Solver V2018 for Excel Frontline Systems, developer of the Solver in Microsoft Excel, recently released Analytic Solver V2018, its full product line of predictive and prescriptive analytics tools that work in Microsoft Excel. The new release includes a visual editor for multi-stage “data science workflows” (also…
    Tags: data
  • 100
    With the rise of big data – and the processes and tools related to utilizing and managing large data sets – organizations are recognizing the value of data as a critical business asset to identify trends, patterns and preferences to drive improved customer experiences and competitive advantage. The problem is,…
    Tags: data
  • 100
    Today, we live in a digital society. Our distinct footprints are in every interaction we make. Data generation is a default – be it from enterprise operational systems, logs from web servers, other applications, social interactions and transactions, research initiatives and connected things (Internet of Things). In fact, according to…
    Tags: data
  • 90
    Thousands of companies all over the world are competing for a finite number of data scientists, paying them big bucks to join their organizations – and setting them up for failure.
    Tags: data


Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a $320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →



INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to