Share with your friends


Analytics Magazine

Health data things: monetizing IoT & health apps

Aaron LaiBy Aaron Lai

Time present and time past
Are both perhaps present in time future
And time future contained in time past.
– “Four Quartets,” T.S. Eliot

The Internet of Things (IoT) is considered to be the next revolution that touches every part of our daily life, from restocking ice cream to warning of pollutants. Analytics professionals understand the importance of data, especially in a complicated field such as healthcare. This article offers a framework on integrating different data sources, as well as a way to unleash the full potential of data to estimate customer lifetime value (CLV). The ultimate goal – monetizing the value of “data as data” – is one of the few things that work against the Law of Diminishing Marginal Return. We’ll illustrate the concept with a real-time biometrics monitoring device and associated mobile phone apps.

Big Little Data: Variety, Velocity and Volume

Big data is generally understood as the 3Vs: variety (different data nature), velocity (rapid arrival of data) and volume (massive quantity of data). Healthcare is a venue for big data, given the volume of data from EHRs (electronic health records) and genomic data. Healthcare also has very rapid data; a pacemaker can generate heartbeat data in real time. Meanwhile, machine-generated data, user-inputted data (e.g., patient-reported outcomes) and observation data (e.g., prescription note) add considerable variety to healthcare data.

Big data provides big opportunities for the healthcare industry. Previous efforts in terms of health devices and health apps (for mobile phones) are mostly focused on data collected via those machines, as well as the business model that could be modified based on data from these and other data sources, both internal and external.

Types of Data Elements

Data sources can be grouped by their nature.

Membership data. This includes user information, service utilization (e.g., call center), user-reported data (e.g., patient-reported outcome such as mood), system-recorded data (e.g., login IP, mobile model) and other information collected from or supplied by the users. Other than restrictions imposed by regulations, the company is free to use this data.

Physically recorded data. This refers to the data collected by devices including biometrics, heartbeat, blood pressure, blood glucose, physical movement, etc. Depending on the device, some data is highly accurate (e.g., blood glucose) and some data is just “directionally correct” (e.g., sleep tracker). It is important not to over-interpret less reliable data.

Third-party data. Many “data brokers” provide individual or household-level data for further analysis or marketing purposes. A typical data set could easily reach thousands of attributes and millions of records. This data could be used for prospect acquisition and data enrichment.

Clinical data. This data usually resides in the EHR system hosted by the payer (i.e., insurance company), the provider (i.e., hospital) or the pharmacist. With the permission of the patients, the company can combine the data to uncover previously unknown relationships or to serve the users better. For example, suppose someone has a cardiovascular disease and is taking a particular type of drug. The activity tracker could monitor and report her physical activity level, while a patient-reported outcome app could account for mood change. It would then be possible to correlate the impact of the drug on the level of her physical activities, as well as its effect on her mood.

Competitive data. Many vendors sell a wide variety of competitive or market-related data. For example, some sell anonymized prescription data while others sell market insights. The data comes from industry survey results and other means.

Public data. The most obvious such source is census data. Other government agencies release data such as hospital discharge rates, disease (Centers for Disease Control and Prevention) and physician data (Centers for Medicare & Medicaid Services). Combining all this data provides clinical insight, behavioral insight and technical insight that can be fed into the R&D, diagnosis and prescription work. Eventually, this process can act as a feedback loop to the user database.

Figure 1: Integrated data flow.

Figure 1: Integrated data flow.

Circle of Levels: Multi-Level Modeling

In statistics, multi-level modeling is used to estimate the parameters of a model if the explanatory variables have both individual and group data. In the following example, we use the same concept but in a slightly different context.

Suppose we have all the data directly collected from the devices and apps as described above for a particular person. At that point, we know the user, and we can add third-party data and clinical data, providing a “full” information set for that individual.

The next step is to integrate the anonymized data such as third-party prescription data, third-party procedure data, etc. The true identity of the person is not known since some of the demographics have been removed or masked due to privacy concerns. However, we could then use a simple Bayesian approach to estimate the probability that this is a similar enough match for analytics purposes. It would be a group-based match, i.e., the likelihood that this known user will share the behavior of this type of anonymous people. It can be written as:

Probability (this user shares the same characteristics of this group of anonymous people) = Probability (this user shares the same characteristics of this group of anonymous people given the pre-defined characteristics) * Probability (pre-defined characteristics)

For example, you may have some users who take a specific type of high blood pressure drug and a specific type of diabetes drug for a certain period of time. You may then be able to extract a population from the third-party prescription database based on those two characteristics. You can then estimate the likelihood that those two groups are the same and set a subjective confidence matching cut-off. It should be noted that the final likelihood also depends on the probability of occurrence. Since this is a binary outcome, you can estimate it with logistic regression or any other techniques such as decision trees or support vector machine. Once you are satisfied with the formula, you can then assign those known users with additional information obtained from the prescription database.

Another type of data is aggregate data. Much of government data or competitive data is in aggregated form, usually geographically based (e.g., county level). We can follow the aforementioned approach to estimate the likelihood that a group of known users should share the same information as another group from another data source. Of course, we are not saying that everyone in Orange County, Calif., is the same. But we could argue that users in Orange County will resemble those in Orange County more than they will resemble those in Cook County, Ill. The same Bayesian estimation could be used.

Estimating the Customer Lifetime Value

The journey of a customer can be illustrated with the diagram shown in Figure 2.

Figure 2: Customer lifecycle.

Figure 2: Customer lifecycle.

The customer lifecycle starts with a prospect, someone who will potentially use your product. Once a prospect contacts you, she becomes a lead. If she has bought (or used) your product, she is now a customer. She will then experience your product and services, and those customer experiences will affect her likelihood to continue. When the renewal moment comes up, she could follow the attrition path to leave or the renewal path to stay. A former customer could also be won back and become a customer again.

Companies often want to know how much they should spend to acquire a customer. One approach is to calculate the customer lifetime value (CLV). If the cost per acquisition (CPA) is lower than the CLV, then this is a positive investment. An equivalent way is to calculate the return on investment (ROI), which is defined as CLV/CPA-1. A positive ROI means positive investment.

However, estimating CLV is not a simple task. As we can see from Figure 2, a customer can follow many paths, and many people use the average tenure as a shortcut. Here we propose a more systematic and data-driven approach to CLV estimation.

Figure 3: Sample customer lifecycle process.

Figure 3: Sample customer lifecycle process.

If we look closer at the customer lifecycle diagram, it is obvious that each path could be considered in a probabilistic way. For example, the transition from prospect to lead is governed by the probability that a person will respond to a solicitation. In other words, this is the response rate of a marketing campaign.

We can look at this problem using a Markov Chain approach. Table 1 is a sample transition matrix, i.e., the probability matrix that shows how people could move from one stage to another. For the most accurate results, it should be person-specific and time-specific. To estimate the equilibrium (or long-term) transition probabilities, we will multiply the transition matrixes together. Therefore, any errors will be amplified through multiplications.

Table 1: Sample transition matrix for a certain person at a certain time.

Table 1: Sample transition matrix for a certain person at a certain time.

The CLV will then be the total revenue and cost at each stage for each person over his or her complete tenure. The value of the additional data that is collected could be estimated by how the improved accuracy will increase CLV. You can also perform if-then analysis using this framework to estimate the value of a new feature for the medical device.

Suppose your medical device can give a five-minute warning for an imminent heart attack. Then you can calculate the value of this feature via a longer tenure (since the patient will live longer) or potential revenue from higher reimbursement from insurance companies (the payers) or higher sales. The difference between this new value and the status quo will be the value of this feature. This can also be used to estimate the fair price in the value-based contract or value-based reimbursement.

Another way to look at it is from a real option framework. The availability of that data enables us to do things. If you neither collect nor acquire the data, you can’t have that feature, even though you might not be able to make this feature work. The cost of the data is the option premium you pay to have a chance to do something else. This is also related to how you can monetarize the data.

For instance, probably few people will be interested in buying the demographics of your user profiles. However, many people will be interested in buying a lead list with people who are physically active and are taking a particular class of drug. As a result, your user list is suddenly worth a lot more to advertisers.

By the same token, insurance companies may not be willing to pay for your medical device as they are skeptical about the value you added. But if you can prove that your device can monitor patients’ health properly and be able to get them to take proper action to avoid adverse and costly issues, e.g., hospital admission, then you can build up a better case to ask for payment or reimbursement. This is the idea behind real-world outcome research.

Conclusion: We Must Part Now

Data is the beginning and the end. The IoT provides unprecedented big data and big opportunities for those who can appreciate it. We have explained what data one could and should collect, illustrated the issues and techniques in integrating those data, described the customer lifetime value (CLV) calculation, and extended the framework to accommodate strategy development. The past and future are both fixed on the present; one has to seize the opportunities presented to unleash the value of data and unleash the monetary value of data.

Aaron Lai ( is the senior manager of analytics for Blue Shield of California. He also serves on the Advisory Board of Business Information and Analytics Center of Purdue University. He has bachelor degrees in finance from City University of Hong Kong and management studies from the University of London, a MBA from Purdue and master’s degrees from Oxford in sociology and evidence-based healthcare. All opinions expressed in this article are his own and do not necessarily reflect those of his employer or his affiliations.

Analytics data science news articles

Related Posts

  • 92
    Today, we live in a digital society. Our distinct footprints are in every interaction we make. Data generation is a default – be it from enterprise operational systems, logs from web servers, other applications, social interactions and transactions, research initiatives and connected things (Internet of Things). In fact, according to…
    Tags: data
  • 87
    January/February Social media, marketing & analytics, v. 4.0 Beyond SaaS: infrastructure, platform as a service Talent shortage: in search of deep analytical skills March/April Software survey: statistical analysis Data revolution: AI and machine learning IoT: devices, connectivity, IT and more May/June Cognitive computing: what’s next? Data quality: cleaning up messy…
    Tags: data
  • 85
    FEATURES Politics & Analytics: Who holds the keys to the White House? By Douglas A. Samuelson Predicting the 2016 U.S. presidential election: What the “13 Keys” forecast, what to watch for and why they might not matter. Missing Metric: The human side of sales analytics By Lisa Clark Exploring the…
    Tags: data
  • 81
    Data Explosion RJMetrics ( CEO Robert J. Moore recently discussed “the data explosion” and its causes and implications at the TEDxPhilly conference at the Kimmel Center in Philadelphia. An interesting 10-minute video.          
    Tags: data
  • 80
    Burtch Works, an executive recruitment agency specializing in big data and data science talent, recently released a couple of surveys that offer interesting insight into the data science job market, as well as the preferred modeling language/statistic tool for analytics professionals.
    Tags: data

Analytics Blog

Electoral College put to the math test

With the campaign two months behind us and the inauguration of Donald Trump two days away, isn’t it time to put the 2016 U.S. presidential election to bed and focus on issues that have yet to be decided? Of course not.



Study: Salaries for early career data scientists decrease for first time

Salaries for early career data scientists decreased year over year for the first time in four years as did the percentage of early career data scientists with a Ph.D. while demand for data scientists continued to increase, according to a recently released Burtch Works’ 2017 salary study of data scientists. Salaries for more experienced data scientists generally held steady or increased slightly depending on an individual’s focus area, responsibility and geographic base, according to the report. Read more →

Generous health insurance plans encourage overtreatment, but may not improve health

Offering comprehensive health insurance plans with low deductibles and co-pay in exchange for higher annual premiums seems like a good value for the risk averse, and a profitable product for insurance companies. But according to a forthcoming study in a leading scholarly marketing journal, the INFORMS journal Marketing Science, such plans can encourage individuals with chronic conditions to turn to needlessly expensive treatments that have little impact on their health outcomes. This in turn raises costs for the insurer and future prices for the insured. Read more →




2017 INFORMS Healthcare Conference
July 26-28, 2017, Rotterdam, the Netherlands


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to