Share with your friends


Analytics Magazine

Healthcare Analytics: Healthcare and big data: Hype or unevenly distributed future?

November/December 2015

By Rajib Ghosh

We had a quiet few months in the healthcare analytics industry news wise. Healthcare organizations, however, continued to build and deploy analytics solutions to optimize care, track outcomes and measure cost. Next month, Healthcare Information and Management Systems Society (HIMSS), a not-for-profit organization dedicated to improving healthcare quality, safety, cost and access through the use of information systems, will kick off a big data and healthcare analytics forum in Boston. I expect to see some interesting developments emerge out of that event.

As a precursor to the conference an interesting discussion has started to shape up: big data in healthcare. At the end of 2014, IDC predicted that by 2018, 50 percent of big data issues will become routine operational IT procedures. Technology companies and pundits predict healthcare will be the next frontier of big data analytics for a long time. In this article I will share my thoughts about the objectivity of this assessment.

Is Healthcare Data Big Data?

There are four attributes of a data set that makes it “big data”: volume, velocity, variety and veracity, which are described in Figure 1.

4 Vs of Big Data

Do digitized healthcare data sets have those attributes? In most cases the answer is no. Digitization in healthcare happened not so long ago. There were some early adopters but not until the Affordable Care Act passed in 2010 – an incentive program created by the Office of the National Coordinator for Health IT for “meaningful use” of electronic health record (EHR) systems – adoption of EHR, the main tool for healthcare data digitization in provider organizations, was quite abysmal. In effect most healthcare delivery organizations started to digitize data only within the last five years or so. Some form of unstructured data like imaging data was stored in electronic format for quite some time. However, widespread adoption lagged because of the high cost of such systems.

In a monolithic healthcare delivery model of the past, most of the digitized data were trapped in fragmented health IT systems that hardly interoperated. In other words, healthcare data volume never really approached the size of what we see in other big data domains such as social networking or consumer marketing.

Healthcare is very episodic in nature and therefore relatively low in volume and velocity. When a patient visits a doctor, a new encounter record gets created. Patient’s vital signs are recorded; allergies, symptoms and prescriptions are created. Once the episode is over, a billing record is generated, and if the patient is insured a claim is sent to the clearinghouse for submission to the patient’s insurance company. If a patient does not come back to see the doctor for the rest of the year or get admitted to a hospital for disease exacerbation, no more data for the patient gets added to any data set.

According to one report, there were about 767,000 practicing doctors in the United States in 2013. Another report shows that on average, a doctor sees about 19 patients per day. This includes patients seen in an office, hospital or nursing home, on a house call or via an e-visit. This creates about 14.5 million encounter records per day assuming all episodes are unique, albeit that’s not the case since we know 5 percent of the population utilizes 50 percent of healthcare resources.

This is nothing compared to the data sets created per day on social media sites such as Facebook, Twitter, YouTube or Instagram and searches conducted by users on Google. Google does more than 3 billion searches per day based on 2012 data. On Facebook, users share 4.75 billion pieces of content per day. In other words the healthcare data set does not grow exponentially at present compared to other big data sets out there.

Healthcare data includes but is not limited to pharmacy data stored in the pharmacy system, structured claims data and physician text notes entered in EHR systems, and images stored in a picture archiving and communication system. Veracity is the other key attribute of healthcare data sets. In the absence of interoperability, many times data dictionaries used in various electronic systems are not consistent, which causes ambiguity.

As electronic data capture is a relatively new phenomenon in many healthcare organizations, data is likely not to be clean or complete. One may get a few data points per patient and only a subset of patients may have many such “little data” entries in the data set. Overall, this makes healthcare data quite unique and difficult for the purpose of training advance algorithms for predicting future events in patients.

Will Healthcare Data Become Big Data?

So, is big data in healthcare another hype? Or is it the future that is already here but as economist William Gibson said, not evenly distributed?

Healthcare data volume can potentially explode if all the fitness or health monitoring wearable app data is added to a patient’s medical record or is combined in some other platform. That will increase data velocity substantially.

A recently published peer-reviewed article projects the growth of human genomics data to surpass the big data domains of YouTube, Twitter and astronomy by 2025 as the cost of genome sequencing falls rapidly and adoption increases. However, scaling use of genomics to be used at the point of care in real time will take an exponential increase in the computational power and an order of magnitude in cost reduction. Along with that, many federal, state, local and HIPAA privacy regulations need to be resolved or rewritten before the data becomes available for widespread analytics.

So it is safe to say that healthcare is still quite far from being a big data domain. There is potential, however, that with the proliferation of the Internet of Things or connected devices during the next decade and advancement of genomics, healthcare data can become a big data domain. But in my opinion the future is just not here yet. We are still stuck in the present and that it is not that big.


Rajib Ghosh ( is an independent consultant and business advisor with 20 years of technology experience in various industry verticals where he had senior-level management roles in software engineering, program management, product management and business and strategy development. Ghosh spent a decade in the U.S. healthcare industry as part of a global ecosystem of medical device manufacturers, medical software companies and telehealth and telemedicine solution providers. He’s held senior positions at Hill-Rom, Solta Medical and Bosch Healthcare. His recent work interest includes public health and the field of IT-enabled sustainable healthcare delivery in the United States as well as emerging nations. Follow Ghosh on twitter @ghosh_r.


business analytics news and articles


Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a $320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →



INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to