Exploiting Analytics: How to get the most out of data lakes
Three requisite business skills that facilitate self-service analytics at unparalleled speed.
By Sean Martin
As one of the premier indicators of the flourishing self-service movement within the datasphere, semantically enriched smart data lakes provide a means for business users to access and analyze big data on an unprecedented, enterprise-wide scale. The enduring relevance of these platforms is almost entirely based on the value that end-users derive from them. Maximizing that utility necessitates a set of skills both familiar and foreign to the business.
The full appreciation of the skills necessary for optimizing data lakes is best understood by elucidating previous competencies that are now obsolete and contemporary ones that have usurped them. Such a transformation heralds a long-awaited displacement of technological reliance by a renewed emphasis on domain knowledge and data savvy. The result is heightened productivity and increased performance thresholds fueled by self-service analytics proficiency.
Abandoned Technological Skills
Deployments of non-semantic data lakes and relational means of ingesting, transforming and analyzing data have always required an inordinate focus on technological competence. It’s a recurring paradox: Business users were unversed in integration requirements, ETL (extract, transform, load) processes and the intricacies of data modeling, yet their abilities to fulfill business objectives with data were inherently circumscribed by them. Utilizing these traditional means of accessing business intelligence and analytics required end users to be responsible – or depend on someone else – for facility in the finer points of SQL, writing code, data modeling, Hadoop skills and MapReduce jobs.
Invariably, the dearth of such intricacies on the part of the business resulted in an over-reliance on IT departments for analytics and business intelligence (BI). More recently, data scientists have been deployed to account for this skills shortage in source integration, loading and analytics results. With these professionals tasked with retooling models and processes every time business requirements or data sources changed, deployments of analytics became less frequent, the esteem for data-driven processes decreased, and tension between the business and IT fomented.
Refined Business Skills
|Deployments of non-semantic data require an
inordinate focus on technological competence.
Enriching data lakes with semantic technologies obsoletes the reliance on technological competencies and provides the business self-service data access and analytics. Consequentially, there is a greater emphasis on business skills for utilizing these options, which never expose nor require any end-user cognizance of underlying semantic components such as RDF, OWL, SPARQL, taxonomies or vocabularies. Initial configuration of smart data lakes necessitates IT or data scientist involvement as do continued efforts to load new data sources. However, most critical facets of linking, contextualizing and data preparation is done upfront and incorporated into an evolving semantic model that facilitates self-service analytics at unparalleled speeds, emphasizing the following business-oriented competencies:
Relating domain expertise to data. Although it combines aspects of data-driven processes and IT involvement, this particular skill is unique to the business and typifies the way that smart data lakes enhance its ability to perform. Most business users are proficient in their domains. Maximizing smart data lake utility, however, requires pairing that expertise with an ability to use data to improve it and business objectives. Specifically, the business needs to understand what types of information relate to objectives, how it is linked to additional information, and what effect synthesizing that data creates for achieving outcomes. These users need to identify which particular attributes and properties of data benefit their business processes most, and formulate questions around them resulting in decisive action.
Tool manipulation: dashboards and BI. This competency represents the extent of the technological involvement on behalf of the business in optimizing smart data lakes. Its prominence is somewhat tempered by the self-service movement’s simplification of visualizations for end users prior to the popularity of these hubs. Nonetheless, business users need proficiency in manipulating the various forms of “publishing” and viewing the results of analytics endeavors facilitated by data lakes, since they no longer have to request them from IT. Competencies in creating and tailoring dashboards, visualization tools or even previously used BI and analytics platforms are needed to determine the impact of analytics on business processes. These skills can be as basic as looking at a web browser interface for results; any variety of platforms works with these repositories.
Embracing exploratory analytics. This competency is partly based on analytics adroitness and partly predicated on the business user’s choice. Perhaps even more than skill in relating domain expertise to data, it represents the full extent to which end-user ability can influence the unparalleled analytics potential smart data lakes provide. Simply understanding that newfound scope of analytics – and exploiting it – requires business user skill, particularly for those that are accustomed to traditional limitations of non-semantic data lakes and relational methods.
Smart data lakes enable the business to traverse all of its organization’s data, not just those in their particular domain. The expanded scope of possibilities that such data yields for analytics is only restrained by the ingenuity and drive of the user. The encompassing nature of such analytics, and the expedience at which questions are answered only to beget more questions, entails a different conception of the possibilities and relevance of analytics itself. The business must adopt this exploratory analytics mindset to approach the true yield that the data lake concept offers.
All of the skills business users need to optimize smart data lake deployments pertain to analytics. This fact is largely entrenched in the reality that most of the other facets of data lakes (integrating sources, contextualizing and linking data, adhering to governance practices) are automated via the incorporation of the semantic model at the heart of these platforms. Those that are not, such as loading new sources and types of data, are done by IT and are easily added to the semantic model without the typical delay associated with this process. Consequentially, business users have the luxury of concentrating on analytics to achieve departmental objectives.
Maximizing Data Lakes: Exploiting Analytics
The majority of the skills requisite for business users of smart data lakes revolve around business itself. This notion especially applies to the integration of domain knowledge with data and the basic dexterity required for tool manipulation. The lack of technological skills needed on the part of the business, however, should not be mistaken for some sort of sleight of hand. The reason these users can now traverse their organization’s entire information assets with an exploratory analytics mindset that may prove revolutionary is that the underlying semantics technologies perceive the context and relationships between data elements – the end user does not. Instead, the business user merely reaps the benefits.
Similarly, it’s the technology – graph models based on the semantic technology standards, not the business user – that is responsible for linking data sources based on those relationships for integration efforts, thus allowing the business to reap the benefits again. The same concept applies to the ability to parse through those data elements in their native forms at speeds that empower those users to leverage more of their data quicker than previously possible. The business user is not responsible for any of those processes, the technology is. Nevertheless, when equipped with the aforementioned skills he or she can readily monetize them.
Sean Martin, founder and chief technology officer of Cambridge Semantics, has been on the leading edge of Internet technology innovation since the early 1990s. Prior to founding Cambridge Semantics, a provider of smart data solutions driven by semantic web technology, he spent 15 years with IBM Corporation where he was a founder and the technology visionary for the IBM Advanced Internet Technology group.