Share with your friends


Analytics Magazine

Dark data: The two sides of the same coin

Organizations need to understand that any data left unexplored is an opportunity lost and a potential security risk. Photo Courtesy of | © aleksanderdn

Organizations need to understand that any data left unexplored is an opportunity lost and a potential security risk.
Photo Courtesy of | © aleksanderdn

Ganesh MoorthyBy Ganesh Moorthy

Today, we live in a digital society. Our distinct footprints are in every interaction we make. Data generation is a default – be it from enterprise operational systems, logs from web servers, other applications, social interactions and transactions, research initiatives and connected things (Internet of Things). In fact, according to a Digital Universe study, 2.2 zettabytes of data was generated in 2012. This grew by 100 percent in 2013, and is slated to grow to 44 zettabytes by 2020 worldwide.

The study further states that only 0.5 percent of the data generated is actually being analyzed. The study goes on to estimate that about 25 percent of the data, if properly managed, tagged and categorized, can be consumed for other purposes.

Enterprises have been collecting and storing data since the age of computers; dark data has always existed. But its close correlation to big data has made it a buzzword (or buzzkill, depending on your point of view) in current times. The challenge, though, is that we are simply not equipped to deal with this constant deluge of data. Compounding this effect is the fact that most of this unanalyzed data is unstructured. It takes more pre-processing and transformation efforts to make data ready for analytical consumption. So then, how do we manage dark data?

Organizations need to understand that any data left unexplored is an opportunity lost and a potential security risk. Based on an organization’s intent and investment appetite, dark data can either be tapped to generate more opportunities or remain in the dark, forever – the two sides of the same coin. We cannot, however, manage it like a coin toss, with a 50 percent probability of achieving heads or tails. Four best practices to keep in mind:

1. Make it a conscious investment. Tapping into the potential of dark data requires organizations to make strategic decisions and investment toward information protection, retention and mining. They need to be owned by a centralized team that can formulate information management policies and guidelines. If possible, federate the process of executing those guidelines to business functions or departments.

2. Fetch your information from data lakes. Set up centralized data lakes or reservoirs along with required encryption and access controls. Employ automated data classification and categorization process towards information management.

3. Metadata-fication. Some enterprise units have started employing advanced machine learning to encrypt, tag and classify on transport level – data in motion rather than data at source. Here, it is important to differentiate between raw source data versus processed data and store them separately, using varying controls in place.

4. Deep diving and data mining. While data retention and management caters to information controls for compliance, data mining generates newer opportunities. There is no swaying in that data can be useful in one form or another. However, data mining must have a business case associated with it. For example, if I am to provide appropriate recommendations to a customer, I will need to consider past buying trends of the customer. Toward this end, I need customer data of the past three years for generating accurate models.

Rather than sieving through a vast repository, if I can combine prioritized business problems, automated advanced data classifications and workflow systems, I would be able to generate quick results. This cognizance requires education and business augmentation units to employ data mining, towards improving customer satisfaction, increasing operational efficiency or creating new growth channels.

Well-rounded Consumption

Dark data can contain important information about the entity, be it an individual or an organization. From an intra-organizational point of view, this information can be used for management – information containment, fraud detection and threat prevention. From an external organization perspective, most of the information contained in dark data can be used for customer 360.

One point to keep in mind is that dark data does not need to be an elephant in the room. All it needs is a data-first leading to an analytics-first and finally an AI-first mind-set. This cause is further propelled by an implementable approach toward solving the dark data problem. There is light at the end of tunnel. Hopefully, you are in the right tunnel to start with!

Ganesh Moorthy is an associate director at Mu Sigma, where he serves as program manager/senior solution architect for R&D engagements. He has more than 16 years of experience in leading enterprise solution development for Fortune 500 clients. He is currently involved in building industrial Internet, augment reality and analytics and visualization platforms for both descriptive and predictive analytics.

Analytics data science news articles

Related Posts

  • 92
    The Internet of Things (IoT) is considered to be the next revolution that touches every part of our daily life, from restocking ice cream to warning of pollutants. Analytics professionals understand the importance of data, especially in a complicated field such as healthcare. This article offers a framework on integrating…
    Tags: data
  • 76
    Deep within the astonishing volumes of raw information generated by business transactions, social media, search engines, IoT and countless other sources, valuable intelligence about customers, markets and organizations, lies waiting to be discovered. Leveraging advanced technologies to explore this expansive universe of unstructured and “dark” data reveals hidden insights to…
    Tags: data, dark
  • 65
    January/February Social media, marketing & analytics, v. 4.0 Beyond SaaS: infrastructure, platform as a service Talent shortage: in search of deep analytical skills March/April Software survey: statistical analysis Data revolution: AI and machine learning IoT: devices, connectivity, IT and more May/June Cognitive computing: what’s next? Data quality: cleaning up messy…
    Tags: data
  • 62
    Burtch Works, an executive recruitment agency specializing in big data and data science talent, recently released a couple of surveys that offer interesting insight into the data science job market, as well as the preferred modeling language/statistic tool for analytics professionals.
    Tags: data
  • 62
    FEATURES Politics & Analytics: Who holds the keys to the White House? By Douglas A. Samuelson Predicting the 2016 U.S. presidential election: What the “13 Keys” forecast, what to watch for and why they might not matter. Missing Metric: The human side of sales analytics By Lisa Clark Exploring the…
    Tags: data

Analytics Blog

Electoral College put to the math test

With the campaign two months behind us and the inauguration of Donald Trump two days away, isn’t it time to put the 2016 U.S. presidential election to bed and focus on issues that have yet to be decided? Of course not.



Study: Salaries for early career data scientists decrease for first time

Salaries for early career data scientists decreased year over year for the first time in four years as did the percentage of early career data scientists with a Ph.D. while demand for data scientists continued to increase, according to a recently released Burtch Works’ 2017 salary study of data scientists. Salaries for more experienced data scientists generally held steady or increased slightly depending on an individual’s focus area, responsibility and geographic base, according to the report. Read more →

Generous health insurance plans encourage overtreatment, but may not improve health

Offering comprehensive health insurance plans with low deductibles and co-pay in exchange for higher annual premiums seems like a good value for the risk averse, and a profitable product for insurance companies. But according to a forthcoming study in a leading scholarly marketing journal, the INFORMS journal Marketing Science, such plans can encourage individuals with chronic conditions to turn to needlessly expensive treatments that have little impact on their health outcomes. This in turn raises costs for the insurer and future prices for the insured. Read more →




2017 INFORMS Healthcare Conference
July 26-28, 2017, Rotterdam, the Netherlands


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to