Share with your friends


Analytics Magazine

Dark data: The two sides of the same coin

Organizations need to understand that any data left unexplored is an opportunity lost and a potential security risk. Photo Courtesy of | © aleksanderdn

Organizations need to understand that any data left unexplored is an opportunity lost and a potential security risk.
Photo Courtesy of | © aleksanderdn

Ganesh MoorthyBy Ganesh Moorthy

Today, we live in a digital society. Our distinct footprints are in every interaction we make. Data generation is a default – be it from enterprise operational systems, logs from web servers, other applications, social interactions and transactions, research initiatives and connected things (Internet of Things). In fact, according to a Digital Universe study, 2.2 zettabytes of data was generated in 2012. This grew by 100 percent in 2013, and is slated to grow to 44 zettabytes by 2020 worldwide.

The study further states that only 0.5 percent of the data generated is actually being analyzed. The study goes on to estimate that about 25 percent of the data, if properly managed, tagged and categorized, can be consumed for other purposes.

Enterprises have been collecting and storing data since the age of computers; dark data has always existed. But its close correlation to big data has made it a buzzword (or buzzkill, depending on your point of view) in current times. The challenge, though, is that we are simply not equipped to deal with this constant deluge of data. Compounding this effect is the fact that most of this unanalyzed data is unstructured. It takes more pre-processing and transformation efforts to make data ready for analytical consumption. So then, how do we manage dark data?

Organizations need to understand that any data left unexplored is an opportunity lost and a potential security risk. Based on an organization’s intent and investment appetite, dark data can either be tapped to generate more opportunities or remain in the dark, forever – the two sides of the same coin. We cannot, however, manage it like a coin toss, with a 50 percent probability of achieving heads or tails. Four best practices to keep in mind:

1. Make it a conscious investment. Tapping into the potential of dark data requires organizations to make strategic decisions and investment toward information protection, retention and mining. They need to be owned by a centralized team that can formulate information management policies and guidelines. If possible, federate the process of executing those guidelines to business functions or departments.

2. Fetch your information from data lakes. Set up centralized data lakes or reservoirs along with required encryption and access controls. Employ automated data classification and categorization process towards information management.

3. Metadata-fication. Some enterprise units have started employing advanced machine learning to encrypt, tag and classify on transport level – data in motion rather than data at source. Here, it is important to differentiate between raw source data versus processed data and store them separately, using varying controls in place.

4. Deep diving and data mining. While data retention and management caters to information controls for compliance, data mining generates newer opportunities. There is no swaying in that data can be useful in one form or another. However, data mining must have a business case associated with it. For example, if I am to provide appropriate recommendations to a customer, I will need to consider past buying trends of the customer. Toward this end, I need customer data of the past three years for generating accurate models.

Rather than sieving through a vast repository, if I can combine prioritized business problems, automated advanced data classifications and workflow systems, I would be able to generate quick results. This cognizance requires education and business augmentation units to employ data mining, towards improving customer satisfaction, increasing operational efficiency or creating new growth channels.

Well-rounded Consumption

Dark data can contain important information about the entity, be it an individual or an organization. From an intra-organizational point of view, this information can be used for management – information containment, fraud detection and threat prevention. From an external organization perspective, most of the information contained in dark data can be used for customer 360.

One point to keep in mind is that dark data does not need to be an elephant in the room. All it needs is a data-first leading to an analytics-first and finally an AI-first mind-set. This cause is further propelled by an implementable approach toward solving the dark data problem. There is light at the end of tunnel. Hopefully, you are in the right tunnel to start with!

Ganesh Moorthy has over 18 years of experience in the areas of solution and product development, solution architecting and innovations. He has built award winning IoT platforms for Industrial Internet initiatives, designed and developed near real-time machine leaning systems for unstructured data anonymization and text / voice analytics along with providing solutions on Big Data. He currently holds the position of Head of Engineering at Tredence, leading a variety of engineering functions along with product development and innovation.

Analytics data science news articles

Related Posts

  • 100
    Businesses are greatly expanding the autonomous capabilities of their products, services and manufacturing processes to better optimize their reliability and efficiency. The processing of big data is playing an integral role in developing these prescriptive analytics. As a result, data scientists and engineers should pay attention to the following aspects…
    Tags: data
  • 100
    Frontline Systems releases Analytic Solver V2018 for Excel Frontline Systems, developer of the Solver in Microsoft Excel, recently released Analytic Solver V2018, its full product line of predictive and prescriptive analytics tools that work in Microsoft Excel. The new release includes a visual editor for multi-stage “data science workflows” (also…
    Tags: data
  • 100
    With the rise of big data – and the processes and tools related to utilizing and managing large data sets – organizations are recognizing the value of data as a critical business asset to identify trends, patterns and preferences to drive improved customer experiences and competitive advantage. The problem is,…
    Tags: data
  • 100
    The Internet of Things (IoT) is considered to be the next revolution that touches every part of our daily life, from restocking ice cream to warning of pollutants. Analytics professionals understand the importance of data, especially in a complicated field such as healthcare. This article offers a framework on integrating…
    Tags: data
  • 85
    Thousands of companies all over the world are competing for a finite number of data scientists, paying them big bucks to join their organizations – and setting them up for failure.
    Tags: data


Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a $320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →



INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to