Share with your friends


Analytics Magazine

Managing, monitoring chronic diseases

January/February 2012

The role of analytics in the Canadian Primary Care Sentinel Surveillance Network.

Basudeb Mukherjee, Karim Keshavjee and Runki BasuBy (Left to right) Basudeb Mukherjee, Karim Keshavjee and Runki Basu

Public health surveillance is an integral part of managing chronic diseases. As electronic health records (EHR) systems are widely adopted, large volumes of patient data are being gathered, stored and analyzed for applications for a wide range of stakeholders, from policy-makers to researchers to public health administrators. Healthcare data requires careful handling due to strict legislation aimed at protecting patients’ privacy. At Queens University in Kingston, Ontario, Canada, researchers have developed a public health surveillance system that monitors chronic diseases in the Canadian population. The data collected through the system will be used to explore how such a system can contribute to improved care and to rein in the spiralling cost of managing chronic diseases in an aging population.

The Canadian Primary Care Sentinel Surveillance Network (CPCSSN) collects data from primary care sources to conduct surveillance and to research the impact of chronic diseases on public health. In the current phase, the research will study five diseases including diabetes mellitus, hypertension, depression, osteoarthritis and COPD (chronic obstructive pulmonary disease). The system is unique in its approach to extract, store and analyze national epidemiological data from electronic health records to improve quality of care through primary care physicians. It also seeks to build a repository of information to impact policy-making in the area of public health and chronic disease management. Stakeholders for the research study include The College of Family Physicians of Canada (CFPC) along with several primary care practice-based research networks (PC-PBRNs) associated with university departments of family medicine across Canada and the Canadian Institute for Health Information (CIHI). Funding for this project is provided by the Public Health Agency of Canada.

System Overview

Healthcare data is particularly sensitive to handle for research purposes, as any violation of privacy and security of data can lead to serious legal consequences and also question the validity of research. In the CPCSSN project, patient data is collected from seven provinces across Canada that have unique data privacy protection legislation. In order to comply with the provincial privacy regulations, the researchers require all data to be extracted and de-identified locally before any data is uploaded into the central repository. Data from every province is loaded into a separately designated and dedicated server even though all servers from all provinces are located in a secure facility maintained at Queens University. For compliance with respective provincial regulations, the server dedicated for one particular province is done in compliance with the regulations of that particular province. Database administrators access data securely through a remote VPN (virtual private network), and all de-identified data exported from regional servers to the central repository takes place over a secure network.

A variety of EHR systems are used from many different vendors across Canada. As a result, different techniques are used for patient data extraction from physicians’ EHRs. In one instance, the EHR is hosted by a provincial government as the application service provider (ASP), and the data is extracted and de-identified by the government and delivered in digital media. However, emphasis now is given on encrypted electronic transfer directly into the regional server from the physicians’ EHRs.

Three primary methods are used to carry out de-identification:

  1. All data fields that include any patient identifying information are removed from the exporting data set. This is done by data managers working through the dataset manually and deleting the information.
  2. Commercially available software has been procured to mitigate probabilistic re-identification of data.
  3. Text de-identification techniques are used to remove any potentially identifying text in extracted data fields.

Additionally, the Research Ethics Board in each location conducts an annual review to audit the compliance of the privacy requirements.

Impediments to Successful Extraction and De-identification of Data

In the initial phase, between April and November of 2008, six sites participated in the data extraction exercise. A set of tasks were defined for each site for data extraction. Only one of the six sites achieved all the tasks necessary to call the extraction successful. The successful site had already been operational for more than three years and had a data manager for three years prior to the initial phase of the study. However, all sites were able to extract data and interpreted them such that the results matched with the disease surveillance done using data from other sources.

Based on the findings in the first phase, a second phase was launched in 2009 where proficient data managers qualified to handle large volume of data were hired. The data mangers were equipped with a data management toolkit complete with the entity-relation diagram of the CPCSSN database and the associated data dictionary, definition of the five diseases that are under study and an electronic tool to track and to document the issues.

The second phase was significant because it indicated a multitude of problems that plagued this approach. While data managers were working independently, they could not refer back to the original network database due to privacy issues. Privacy issues also limited the managers’ ability to have additional inspection of data and manual de-identification of text strings. The problems were compounded by the physicians’ habit of spelling medical terms and abbreviations in a non-uniform manner.

Looking ahead, the data collected from primary care settings can be interpreted and put to effective use through data analytics. Following are a number of areas where analytics can be applied to the data to solve healthcare problems:

Feedback of primary care data to primary care physicians. Primary care physicians often do not have resources to interpret the patient data collected through their EHR systems. By the application of analytics, patient data can be routinely and automatically mined for treatment outcomes. As policy-makers move toward variable benefits design, clinical outcomes become more important than the volume of care. Improving clinical outcomes and reducing the volume of care, including reduction in the rate of hospitalization, can be achieved through meaningful interpretation of primary care patient data.

Comparative effectiveness research. Data mining can be effectively used to determine best treatment options in the event multiple options are present. For example, a disease such as cancer can have a large number of treatment protocols. It would be difficult to understand which protocol enhances the chances of survival best. However, with the application of analytics – and with the availability of a large volume of real-time data procured from data sources within the primary care settings – comparative effectiveness of treatments can be established.

Basudeb Mukherjee ( is the president and CEO and Runki Basu is the vice president of Synergy Tech, an IT company specializing in healthcare. Dr. Karim Keshavjee is a family physician with a part-time practice and a full-time health informatics consultancy practice. Dr. Keshavjee was the physician subject matter expert to Canada Health Infoway for the pan-Canadian electronic prescribing project (CeRx) and the inter-operable electronic health record (iEHR) project. He is an associate member of the Centre for Evaluation of Medicines, a think-tank and research institute that takes an interdisciplinary approach to use of medications in the community setting. He is also an adjunct assistant professor at the University of Victoria and is vice-chair of the Advisory Board for the McMaster University program on e-Health.


  1. Keshavjee K., Chevendra V., Martin K., Jackson D., Aliarzadeh B., Kinsella L., Turcotte R., Sabri S., Chen T., 2011, “Design and testing of an architecture for a national primary care chronic disease surveillance network in Canada,” Studies in Health Technology Informatics.
  2. Canadian Primary Care Sentinel Surveillance Network, (last accessed Dec. 1, 2011).

business analytics news and articles


Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →

Yale research on immigration, aging runners makes news

A recent study by Yale University professor and former INFORMS President Edward H. Kaplan (photo) and Yale colleague Jonathan Feinstein and Mohammad M. Fazel-Zarandi of MIT suggests that the number of undocumented immigrants in the United States is nearly twice as many as experts previously thought. Since its publication last month, the study, which estimates the number of such immigrants at 22.1 million instead of 11.3 million, has garnered worldwide attention from major media outlets including the Los Angeles Times, the Boston Globe, Fox News, Bloomberg News and the Daily Mail. Read more →

New salary survey paints optimistic picture for analytics professionals

Harnham, a global leader in data and analytics recruitment, recently released the 2018 editions of its salary guides for the United Kingdom, the United States and Europe. Having heard from thousands of data and analytics professionals across the globe, Harnham has gained an invaluable insight into key industry salaries and trends across a wide variety of analytics specialties and sectors. Read more →



Winter Simulation Conference
Dec. 9-12, 2018, Gothenburg, Sweden

INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Applied AI & Machine Learning | Comprehensive
Dec. 3, 2018 (live online)

Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to