Share with your friends


Analytics Magazine

Managing, monitoring chronic diseases

January/February 2012

The role of analytics in the Canadian Primary Care Sentinel Surveillance Network.

Basudeb Mukherjee, Karim Keshavjee and Runki BasuBy (Left to right) Basudeb Mukherjee, Karim Keshavjee and Runki Basu

Public health surveillance is an integral part of managing chronic diseases. As electronic health records (EHR) systems are widely adopted, large volumes of patient data are being gathered, stored and analyzed for applications for a wide range of stakeholders, from policy-makers to researchers to public health administrators. Healthcare data requires careful handling due to strict legislation aimed at protecting patients’ privacy. At Queens University in Kingston, Ontario, Canada, researchers have developed a public health surveillance system that monitors chronic diseases in the Canadian population. The data collected through the system will be used to explore how such a system can contribute to improved care and to rein in the spiralling cost of managing chronic diseases in an aging population.

The Canadian Primary Care Sentinel Surveillance Network (CPCSSN) collects data from primary care sources to conduct surveillance and to research the impact of chronic diseases on public health. In the current phase, the research will study five diseases including diabetes mellitus, hypertension, depression, osteoarthritis and COPD (chronic obstructive pulmonary disease). The system is unique in its approach to extract, store and analyze national epidemiological data from electronic health records to improve quality of care through primary care physicians. It also seeks to build a repository of information to impact policy-making in the area of public health and chronic disease management. Stakeholders for the research study include The College of Family Physicians of Canada (CFPC) along with several primary care practice-based research networks (PC-PBRNs) associated with university departments of family medicine across Canada and the Canadian Institute for Health Information (CIHI). Funding for this project is provided by the Public Health Agency of Canada.

System Overview

Healthcare data is particularly sensitive to handle for research purposes, as any violation of privacy and security of data can lead to serious legal consequences and also question the validity of research. In the CPCSSN project, patient data is collected from seven provinces across Canada that have unique data privacy protection legislation. In order to comply with the provincial privacy regulations, the researchers require all data to be extracted and de-identified locally before any data is uploaded into the central repository. Data from every province is loaded into a separately designated and dedicated server even though all servers from all provinces are located in a secure facility maintained at Queens University. For compliance with respective provincial regulations, the server dedicated for one particular province is done in compliance with the regulations of that particular province. Database administrators access data securely through a remote VPN (virtual private network), and all de-identified data exported from regional servers to the central repository takes place over a secure network.

A variety of EHR systems are used from many different vendors across Canada. As a result, different techniques are used for patient data extraction from physicians’ EHRs. In one instance, the EHR is hosted by a provincial government as the application service provider (ASP), and the data is extracted and de-identified by the government and delivered in digital media. However, emphasis now is given on encrypted electronic transfer directly into the regional server from the physicians’ EHRs.

Three primary methods are used to carry out de-identification:

  1. All data fields that include any patient identifying information are removed from the exporting data set. This is done by data managers working through the dataset manually and deleting the information.
  2. Commercially available software has been procured to mitigate probabilistic re-identification of data.
  3. Text de-identification techniques are used to remove any potentially identifying text in extracted data fields.

Additionally, the Research Ethics Board in each location conducts an annual review to audit the compliance of the privacy requirements.

Impediments to Successful Extraction and De-identification of Data

In the initial phase, between April and November of 2008, six sites participated in the data extraction exercise. A set of tasks were defined for each site for data extraction. Only one of the six sites achieved all the tasks necessary to call the extraction successful. The successful site had already been operational for more than three years and had a data manager for three years prior to the initial phase of the study. However, all sites were able to extract data and interpreted them such that the results matched with the disease surveillance done using data from other sources.

Based on the findings in the first phase, a second phase was launched in 2009 where proficient data managers qualified to handle large volume of data were hired. The data mangers were equipped with a data management toolkit complete with the entity-relation diagram of the CPCSSN database and the associated data dictionary, definition of the five diseases that are under study and an electronic tool to track and to document the issues.

The second phase was significant because it indicated a multitude of problems that plagued this approach. While data managers were working independently, they could not refer back to the original network database due to privacy issues. Privacy issues also limited the managers’ ability to have additional inspection of data and manual de-identification of text strings. The problems were compounded by the physicians’ habit of spelling medical terms and abbreviations in a non-uniform manner.

Looking ahead, the data collected from primary care settings can be interpreted and put to effective use through data analytics. Following are a number of areas where analytics can be applied to the data to solve healthcare problems:

Feedback of primary care data to primary care physicians. Primary care physicians often do not have resources to interpret the patient data collected through their EHR systems. By the application of analytics, patient data can be routinely and automatically mined for treatment outcomes. As policy-makers move toward variable benefits design, clinical outcomes become more important than the volume of care. Improving clinical outcomes and reducing the volume of care, including reduction in the rate of hospitalization, can be achieved through meaningful interpretation of primary care patient data.

Comparative effectiveness research. Data mining can be effectively used to determine best treatment options in the event multiple options are present. For example, a disease such as cancer can have a large number of treatment protocols. It would be difficult to understand which protocol enhances the chances of survival best. However, with the application of analytics – and with the availability of a large volume of real-time data procured from data sources within the primary care settings – comparative effectiveness of treatments can be established.

Basudeb Mukherjee ( is the president and CEO and Runki Basu is the vice president of Synergy Tech, an IT company specializing in healthcare. Dr. Karim Keshavjee is a family physician with a part-time practice and a full-time health informatics consultancy practice. Dr. Keshavjee was the physician subject matter expert to Canada Health Infoway for the pan-Canadian electronic prescribing project (CeRx) and the inter-operable electronic health record (iEHR) project. He is an associate member of the Centre for Evaluation of Medicines, a think-tank and research institute that takes an interdisciplinary approach to use of medications in the community setting. He is also an adjunct assistant professor at the University of Victoria and is vice-chair of the Advisory Board for the McMaster University program on e-Health.


  1. Keshavjee K., Chevendra V., Martin K., Jackson D., Aliarzadeh B., Kinsella L., Turcotte R., Sabri S., Chen T., 2011, “Design and testing of an architecture for a national primary care chronic disease surveillance network in Canada,” Studies in Health Technology Informatics.
  2. Canadian Primary Care Sentinel Surveillance Network, (last accessed Dec. 1, 2011).

business analytics news and articles


Fighting terrorists online: Identifying extremists before they post content

New research has found a way to identify extremists, such as those associated with the terrorist group ISIS, by monitoring their social media accounts, and can identify them even before they post threatening content. The research, “Finding Extremists in Online Social Networks,” which was recently published in the INFORMS journal Operations Research, was conducted by Tauhid Zaman of the MIT, Lt. Col. Christopher E. Marks of the U.S. Army and Jytte Klausen of Brandeis University. Read more →

Syrian conflict yields model for attrition dynamics in multilateral war

Based on their study of the Syrian Civil War that’s been raging since 2011, three researchers created a predictive model for multilateral war called the Lanchester multiduel. Unless there is a player so strong it can guarantee a win regardless of what others do, the likely outcome of multilateral war is a gradual stalemate that culminates in the mutual annihilation of all players, according to the model. Read more →

SAS, Samford University team up to generate sports analytics talent

Sports teams try to squeeze out every last bit of talent to gain a competitive advantage on the field. That’s also true in college athletic departments and professional team offices, where entire departments devoted to analyzing data hunt for sports analytics experts that can give them an edge in a game, in the stands and beyond. To create this talent, analytics company SAS will collaborate with the Samford University Center for Sports Analytics to support teaching, learning and research in all areas where analytics affects sports, including fan engagement, sponsorship, player tracking, sports medicine, sports media and operations. Read more →



INFORMS Annual Meeting
Nov. 4-7, 2018, Phoenix

Winter Simulation Conference
Dec. 9-12, 2018, Gothenburg, Sweden


Making Data Science Pay
Oct. 29 -30, 12 p.m.-5 p.m.

Applied AI & Machine Learning | Comprehensive
Starts Oct. 29, 2018 (live online)

The Analytics Clinic
Citizen Data Scientists | Why Not DIY AI?
Nov. 8, 2018, 11 a.m. – 12:30 p.m.

Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to