Share with your friends


Analytics Magazine

Exploiting Analytics: How to get the most out of data lakes

May/June 2016

Three requisite business skills that facilitate self-service analytics at unparalleled speed.

Sean MartinBy Sean Martin

As one of the premier indicators of the flourishing self-service movement within the datasphere, semantically enriched smart data lakes provide a means for business users to access and analyze big data on an unprecedented, enterprise-wide scale. The enduring relevance of these platforms is almost entirely based on the value that end-users derive from them. Maximizing that utility necessitates a set of skills both familiar and foreign to the business.

The full appreciation of the skills necessary for optimizing data lakes is best understood by elucidating previous competencies that are now obsolete and contemporary ones that have usurped them. Such a transformation heralds a long-awaited displacement of technological reliance by a renewed emphasis on domain knowledge and data savvy. The result is heightened productivity and increased performance thresholds fueled by self-service analytics proficiency.

Abandoned Technological Skills

Deployments of non-semantic data lakes and relational means of ingesting, transforming and analyzing data have always required an inordinate focus on technological competence. It’s a recurring paradox: Business users were unversed in integration requirements, ETL (extract, transform, load) processes and the intricacies of data modeling, yet their abilities to fulfill business objectives with data were inherently circumscribed by them. Utilizing these traditional means of accessing business intelligence and analytics required end users to be responsible – or depend on someone else – for facility in the finer points of SQL, writing code, data modeling, Hadoop skills and MapReduce jobs.

Invariably, the dearth of such intricacies on the part of the business resulted in an over-reliance on IT departments for analytics and business intelligence (BI). More recently, data scientists have been deployed to account for this skills shortage in source integration, loading and analytics results. With these professionals tasked with retooling models and processes every time business requirements or data sources changed, deployments of analytics became less frequent, the esteem for data-driven processes decreased, and tension between the business and IT fomented.

Refined Business Skills

Deployments of non-semantic data require an inordinate focus on technological competence.
Deployments of non-semantic data require an
inordinate focus on technological competence.

Enriching data lakes with semantic technologies obsoletes the reliance on technological competencies and provides the business self-service data access and analytics. Consequentially, there is a greater emphasis on business skills for utilizing these options, which never expose nor require any end-user cognizance of underlying semantic components such as RDF, OWL, SPARQL, taxonomies or vocabularies. Initial configuration of smart data lakes necessitates IT or data scientist involvement as do continued efforts to load new data sources. However, most critical facets of linking, contextualizing and data preparation is done upfront and incorporated into an evolving semantic model that facilitates self-service analytics at unparalleled speeds, emphasizing the following business-oriented competencies:

Relating domain expertise to data. Although it combines aspects of data-driven processes and IT involvement, this particular skill is unique to the business and typifies the way that smart data lakes enhance its ability to perform. Most business users are proficient in their domains. Maximizing smart data lake utility, however, requires pairing that expertise with an ability to use data to improve it and business objectives. Specifically, the business needs to understand what types of information relate to objectives, how it is linked to additional information, and what effect synthesizing that data creates for achieving outcomes. These users need to identify which particular attributes and properties of data benefit their business processes most, and formulate questions around them resulting in decisive action.

Tool manipulation: dashboards and BI. This competency represents the extent of the technological involvement on behalf of the business in optimizing smart data lakes. Its prominence is somewhat tempered by the self-service movement’s simplification of visualizations for end users prior to the popularity of these hubs. Nonetheless, business users need proficiency in manipulating the various forms of “publishing” and viewing the results of analytics endeavors facilitated by data lakes, since they no longer have to request them from IT. Competencies in creating and tailoring dashboards, visualization tools or even previously used BI and analytics platforms are needed to determine the impact of analytics on business processes. These skills can be as basic as looking at a web browser interface for results; any variety of platforms works with these repositories.

Embracing exploratory analytics. This competency is partly based on analytics adroitness and partly predicated on the business user’s choice. Perhaps even more than skill in relating domain expertise to data, it represents the full extent to which end-user ability can influence the unparalleled analytics potential smart data lakes provide. Simply understanding that newfound scope of analytics – and exploiting it – requires business user skill, particularly for those that are accustomed to traditional limitations of non-semantic data lakes and relational methods.

Smart data lakes enable the business to traverse all of its organization’s data, not just those in their particular domain. The expanded scope of possibilities that such data yields for analytics is only restrained by the ingenuity and drive of the user. The encompassing nature of such analytics, and the expedience at which questions are answered only to beget more questions, entails a different conception of the possibilities and relevance of analytics itself. The business must adopt this exploratory analytics mindset to approach the true yield that the data lake concept offers.

All of the skills business users need to optimize smart data lake deployments pertain to analytics. This fact is largely entrenched in the reality that most of the other facets of data lakes (integrating sources, contextualizing and linking data, adhering to governance practices) are automated via the incorporation of the semantic model at the heart of these platforms. Those that are not, such as loading new sources and types of data, are done by IT and are easily added to the semantic model without the typical delay associated with this process. Consequentially, business users have the luxury of concentrating on analytics to achieve departmental objectives.

Maximizing Data Lakes: Exploiting Analytics

The majority of the skills requisite for business users of smart data lakes revolve around business itself. This notion especially applies to the integration of domain knowledge with data and the basic dexterity required for tool manipulation. The lack of technological skills needed on the part of the business, however, should not be mistaken for some sort of sleight of hand. The reason these users can now traverse their organization’s entire information assets with an exploratory analytics mindset that may prove revolutionary is that the underlying semantics technologies perceive the context and relationships between data elements – the end user does not. Instead, the business user merely reaps the benefits.

Similarly, it’s the technology – graph models based on the semantic technology standards, not the business user – that is responsible for linking data sources based on those relationships for integration efforts, thus allowing the business to reap the benefits again. The same concept applies to the ability to parse through those data elements in their native forms at speeds that empower those users to leverage more of their data quicker than previously possible. The business user is not responsible for any of those processes, the technology is. Nevertheless, when equipped with the aforementioned skills he or she can readily monetize them.

Sean Martin, founder and chief technology officer of Cambridge Semantics, has been on the leading edge of Internet technology innovation since the early 1990s. Prior to founding Cambridge Semantics, a provider of smart data solutions driven by semantic web technology, he spent 15 years with IBM Corporation where he was a founder and the technology visionary for the IBM Advanced Internet Technology group.

business analytics news and articles






Fighting terrorists online: Identifying extremists before they post content

New research has found a way to identify extremists, such as those associated with the terrorist group ISIS, by monitoring their social media accounts, and can identify them even before they post threatening content. The research, “Finding Extremists in Online Social Networks,” which was recently published in the INFORMS journal Operations Research, was conducted by Tauhid Zaman of the MIT, Lt. Col. Christopher E. Marks of the U.S. Army and Jytte Klausen of Brandeis University. Read more →

Syrian conflict yields model for attrition dynamics in multilateral war

Based on their study of the Syrian Civil War that’s been raging since 2011, three researchers created a predictive model for multilateral war called the Lanchester multiduel. Unless there is a player so strong it can guarantee a win regardless of what others do, the likely outcome of multilateral war is a gradual stalemate that culminates in the mutual annihilation of all players, according to the model. Read more →

SAS, Samford University team up to generate sports analytics talent

Sports teams try to squeeze out every last bit of talent to gain a competitive advantage on the field. That’s also true in college athletic departments and professional team offices, where entire departments devoted to analyzing data hunt for sports analytics experts that can give them an edge in a game, in the stands and beyond. To create this talent, analytics company SAS will collaborate with the Samford University Center for Sports Analytics to support teaching, learning and research in all areas where analytics affects sports, including fan engagement, sponsorship, player tracking, sports medicine, sports media and operations. Read more →



INFORMS Annual Meeting
Nov. 4-7, 2018, Phoenix

Winter Simulation Conference
Dec. 9-12, 2018, Gothenburg, Sweden


Making Data Science Pay
Oct. 29 -30, 12 p.m.-5 p.m.

Applied AI & Machine Learning | Comprehensive
Starts Oct. 29, 2018 (live online)

The Analytics Clinic
Citizen Data Scientists | Why Not DIY AI?
Nov. 8, 2018, 11 a.m. – 12:30 p.m.

Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to