Share with your friends


Analytics Magazine

Dark data: The two sides of the same coin

Organizations need to understand that any data left unexplored is an opportunity lost and a potential security risk. Photo Courtesy of | © aleksanderdn

Organizations need to understand that any data left unexplored is an opportunity lost and a potential security risk.
Photo Courtesy of | © aleksanderdn

Ganesh MoorthyBy Ganesh Moorthy

Today, we live in a digital society. Our distinct footprints are in every interaction we make. Data generation is a default – be it from enterprise operational systems, logs from web servers, other applications, social interactions and transactions, research initiatives and connected things (Internet of Things). In fact, according to a Digital Universe study, 2.2 zettabytes of data was generated in 2012. This grew by 100 percent in 2013, and is slated to grow to 44 zettabytes by 2020 worldwide.

The study further states that only 0.5 percent of the data generated is actually being analyzed. The study goes on to estimate that about 25 percent of the data, if properly managed, tagged and categorized, can be consumed for other purposes.

Enterprises have been collecting and storing data since the age of computers; dark data has always existed. But its close correlation to big data has made it a buzzword (or buzzkill, depending on your point of view) in current times. The challenge, though, is that we are simply not equipped to deal with this constant deluge of data. Compounding this effect is the fact that most of this unanalyzed data is unstructured. It takes more pre-processing and transformation efforts to make data ready for analytical consumption. So then, how do we manage dark data?

Organizations need to understand that any data left unexplored is an opportunity lost and a potential security risk. Based on an organization’s intent and investment appetite, dark data can either be tapped to generate more opportunities or remain in the dark, forever – the two sides of the same coin. We cannot, however, manage it like a coin toss, with a 50 percent probability of achieving heads or tails. Four best practices to keep in mind:

1. Make it a conscious investment. Tapping into the potential of dark data requires organizations to make strategic decisions and investment toward information protection, retention and mining. They need to be owned by a centralized team that can formulate information management policies and guidelines. If possible, federate the process of executing those guidelines to business functions or departments.

2. Fetch your information from data lakes. Set up centralized data lakes or reservoirs along with required encryption and access controls. Employ automated data classification and categorization process towards information management.

3. Metadata-fication. Some enterprise units have started employing advanced machine learning to encrypt, tag and classify on transport level – data in motion rather than data at source. Here, it is important to differentiate between raw source data versus processed data and store them separately, using varying controls in place.

4. Deep diving and data mining. While data retention and management caters to information controls for compliance, data mining generates newer opportunities. There is no swaying in that data can be useful in one form or another. However, data mining must have a business case associated with it. For example, if I am to provide appropriate recommendations to a customer, I will need to consider past buying trends of the customer. Toward this end, I need customer data of the past three years for generating accurate models.

Rather than sieving through a vast repository, if I can combine prioritized business problems, automated advanced data classifications and workflow systems, I would be able to generate quick results. This cognizance requires education and business augmentation units to employ data mining, towards improving customer satisfaction, increasing operational efficiency or creating new growth channels.

Well-rounded Consumption

Dark data can contain important information about the entity, be it an individual or an organization. From an intra-organizational point of view, this information can be used for management – information containment, fraud detection and threat prevention. From an external organization perspective, most of the information contained in dark data can be used for customer 360.

One point to keep in mind is that dark data does not need to be an elephant in the room. All it needs is a data-first leading to an analytics-first and finally an AI-first mind-set. This cause is further propelled by an implementable approach toward solving the dark data problem. There is light at the end of tunnel. Hopefully, you are in the right tunnel to start with!

Ganesh Moorthy has over 18 years of experience in the areas of solution and product development, solution architecting and innovations. He has built award winning IoT platforms for Industrial Internet initiatives, designed and developed near real-time machine leaning systems for unstructured data anonymization and text / voice analytics along with providing solutions on Big Data. He currently holds the position of Head of Engineering at Tredence, leading a variety of engineering functions along with product development and innovation.

Analytics data science news articles

Related Posts

  • 100
    Frontline Systems releases Analytic Solver V2018 for Excel Frontline Systems, developer of the Solver in Microsoft Excel, recently released Analytic Solver V2018, its full product line of predictive and prescriptive analytics tools that work in Microsoft Excel. The new release includes a visual editor for multi-stage “data science workflows” (also…
    Tags: data
  • 100
    Businesses are greatly expanding the autonomous capabilities of their products, services and manufacturing processes to better optimize their reliability and efficiency. The processing of big data is playing an integral role in developing these prescriptive analytics. As a result, data scientists and engineers should pay attention to the following aspects…
    Tags: data
  • 100
    With the rise of big data – and the processes and tools related to utilizing and managing large data sets – organizations are recognizing the value of data as a critical business asset to identify trends, patterns and preferences to drive improved customer experiences and competitive advantage. The problem is,…
    Tags: data
  • 100
    The Internet of Things (IoT) is considered to be the next revolution that touches every part of our daily life, from restocking ice cream to warning of pollutants. Analytics professionals understand the importance of data, especially in a complicated field such as healthcare. This article offers a framework on integrating…
    Tags: data
  • 85
    Thousands of companies all over the world are competing for a finite number of data scientists, paying them big bucks to join their organizations – and setting them up for failure.
    Tags: data


Fighting terrorists online: Identifying extremists before they post content

New research has found a way to identify extremists, such as those associated with the terrorist group ISIS, by monitoring their social media accounts, and can identify them even before they post threatening content. The research, “Finding Extremists in Online Social Networks,” which was recently published in the INFORMS journal Operations Research, was conducted by Tauhid Zaman of the MIT, Lt. Col. Christopher E. Marks of the U.S. Army and Jytte Klausen of Brandeis University. Read more →

Syrian conflict yields model for attrition dynamics in multilateral war

Based on their study of the Syrian Civil War that’s been raging since 2011, three researchers created a predictive model for multilateral war called the Lanchester multiduel. Unless there is a player so strong it can guarantee a win regardless of what others do, the likely outcome of multilateral war is a gradual stalemate that culminates in the mutual annihilation of all players, according to the model. Read more →

SAS, Samford University team up to generate sports analytics talent

Sports teams try to squeeze out every last bit of talent to gain a competitive advantage on the field. That’s also true in college athletic departments and professional team offices, where entire departments devoted to analyzing data hunt for sports analytics experts that can give them an edge in a game, in the stands and beyond. To create this talent, analytics company SAS will collaborate with the Samford University Center for Sports Analytics to support teaching, learning and research in all areas where analytics affects sports, including fan engagement, sponsorship, player tracking, sports medicine, sports media and operations. Read more →



INFORMS Annual Meeting
Nov. 4-7, 2018, Phoenix

Winter Simulation Conference
Dec. 9-12, 2018, Gothenburg, Sweden


Making Data Science Pay
Oct. 29 -30, 12 p.m.-5 p.m.

Applied AI & Machine Learning | Comprehensive
Starts Oct. 29, 2018 (live online)

The Analytics Clinic
Citizen Data Scientists | Why Not DIY AI?
Nov. 8, 2018, 11 a.m. – 12:30 p.m.

Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to