Realizing Value: Building a high-performance big data analytics organization
“From Twitter feeds to photo streams to RFID pings, the big data universe is rapidly expanding, providing unprecedented opportunities to understand the present and peer into the future. Tapping its potential while avoiding its pitfalls doesn’t take magic; it takes a roadmap.” — Chris Berdik, author of “Mind over Mind”
By (l-r) Pramod Singh, Ritin Mathur and Srujana H.M.
The digital universe is expected to expand exponentially between 2013 and 2020, from 4.4 trillion gigabytes to 44 trillion gigabytes  a year, and this massive amount of data will significantly impact global industries. In nontraditional context, analyzing big data isn’t about managing more or diverse data. Rather it is about asking new questions, up skilling new capabilities, building new technological environments and devising holistic communication strategies to encompass the nuances associated with complexities of volumes of data.
Numerous opportunities, as well as challenges, are associated with big data. Time savings can be achieved through real-time monitoring and forecasting of events that impact either business performance or operation. Significant cost savings over traditional analytical techniques can be achieved by adoption of big data due to usage of Hadoop clusters.
While companies with business models predicated on the Internet have been the pioneers of developing big data analytics, other firms with more established non-Internet-based models are also rapidly adopting big data analytics practices, typically in response to consumer and technology trends. With the emphatic big data explosion, it becomes imperative for organizations to assess and adopt big data analytics practices into their decision-making process.
Big Data Tools and Technologies
When considering an organization’s needs for big data tools and technologies, it is useful to think of them in four dimensions.
1. Structured data management: Tools for managing high-volume structured data (for instance, clickstream data or machine or sensor data) are an important part of any big data technology stack.
2. Unstructured data management: The explosion in data volumes have been to a large extent a result of the rise in human information, which is typically comprised of social media data, videos, pictures and even text data from customer support logs. Tools and technologies to manage, analyze and make sense of this data stream are critical to build understanding and to correlate with other forms of structured data.
3. Analytics environment: Combining both structured and unstructured data, at scale, requires specialized tools and technologies to be able to merge these data sets and to be able to run analytical algorithms. Concepts such as in-database and in-memory analytics have greatly enhanced the ability to use large data sets for analysis at near real-time speeds and to combine the analytics environment within, for example, structured data management tools.
4. Visualization: Intuitive representation of data and results of analysis is a critical final component of the big data technology stack. This furthers the speed at which results are understood and insights derived. Tools and technologies that allow for quick drill down, investigative analysis are now pervasive and easily integrated into the analytics stack .
Most tools designed for data mining or conventional statistical analysis are not optimal for large data sets. A common hurdle to cross for most analytics organizations trying to leverage big data analytics is availability of big data technologies and platforms. Organizations usually start off by using open source technologies to gain experience and expertise. The big data analytics space, thankfully, provides many open source options for organizations.
For example, Hadoop is a good starting place for being able to manage large data at scale. Combining this with NoSQL databases such as Hbase or MySQL can provide a good first step to get a feel for handling large data sets. Hadoop ecosystem tools like Hive, Pig, Sqoop, etc. allow data scientists to also get a feel for being able to query and analyze large data sets. R is an open source programming language and software environment designed for statistical computing and visualization . For visualization, tools like d3.js allow for creative and varied visualization sets to help data scientists present results in an intuitive way.
The challenge with using open source technologies though is two-fold. One, integrating these with a legacy enterprise stack is not easy, and most IT organizations don’t yet allow for easy integration. This integration quickly becomes critical when one moves beyond experimentation into solving real-world business problems that require multi-dimensional data, some of which might be in legacy enterprise data warehouse (EDW) environments . Second, while strong use communities around open source technologies exist, the learning curve could be longer given the often less than user-friendly nature of these technologies. Learning on open source requires a certain level of existing expertise, and beginners may find a learning approach based on open source harder.
Integrated Big Data Analytics Platform
Most analytics for business use cases rely on bringing together diverse data sets to analyze. With big data, these data sets are no longer limited to just structured data; they increasingly leverage unstructured data as well. This calls for a big data environment that allows data scientists to work seamlessly across data streams.
Using an integrated environment provides a quicker, more scalable and integrated approach to analytics. This allows for a user-friendly environment for data scientists to learn new skills and adapt to working with and running analytics on large data sets. HP HAVEn, for example, brings together Hadoop, Autonomy, Vertica, HP Enterprise Security and any number of applications.
Building Organizational Skills
Providing for big data technologies and platforms sets the baseline for an organization. What needs to be done next is to have a focused effort across the organization to build the skills in these technologies.
In contrast to traditional analytical organizations, big data organizations need to augment existing analytical staffs with data scientists who possess a higher level of technical capabilities, as well as the ability to manipulate big data technologies. These capabilities might include natural language processing and text mining skills; video, image and visual analytics experience; as well as the ability to code in scripting languages such as Python, Pig and Hivev. A data scientist in a big data analytics organization typically needs skills in three core areas: 1. business intelligence related skills to get to the data quickly, 2. statistics and analytical techniques to be able to analyze and, 3. business skills to be able to interpret analysis results in business terms.
The time that an analytics organization has to respond to a business need is shrinking. This gives rise to a situation where you need all three skill sets in one person, which is hard to find.
To guide skill development among the existing analyst community, HP developed competency centers aligned to each of the key technologies – Vertica for structured data analytics, Autonomy for unstructured data analytics and Hadoop as a data lake. The competency centers cater to focused competency development through collaboration, training and live projects. These competency centers, composed of data scientists across the organization, created a skills framework and a big data curriculum to guide the skill development effort.
Re-thinking Business Analytics
With the right tools, technologies and skill sets, an organization’s next step is deploying big data analytics to solve analytics questions in different application areas. A challenge some analytics organizations might have is getting their teams to think about how big data analytics applies to their business areas. Given the relative maturity of analytics solutions across most domains, teams sometimes have difficulty in assessing how big data could help.
At HP, our belief is that big data analytics impacts analytics in two ways: 1. helping answer existing/legacy questions in newer ways and, 2. addressing a range of newer questions and decisions organizations face today.
An example of the existing/legacy question is the area of segmentation, a well-understood area in marketing and customer analytics. The challenge for most business analytics teams, though, is to look at segmentation with the fresh lens of big data analytics: How can big data help make segmentation better?
Examples of how big data analytics can help address new questions is perhaps best exemplified when considering either social media analytics or machine/sensor data analytics. These diverse areas are important aspects in business decision-making and impact such functions as marketing, manufacturing, customer service and R&D. Both require new analytical approaches to manage the large streams of data that get generated.
To realize value from big data analytics, organizations need to integrate technology, tools and practices with existing analytics ecosystems. The choices to make in terms of which tools to select and which skills to develop need careful consideration and have a long-term impact on an organization’s ability to integrate big data analytics.
Analytics organizations should start by considering four key questions:
- What technology and tools are needed?
- What platform is best for integrating these technology choices with each other, as well as with legacy environments?
- Which skills do we need to develop and how do we develop them?
- How do we integrate all of the above into the business decision-making process?
These questions require senior management time and attention. Addressing these issues comprehensively can reduce the barriers to success for analytics organizations looking to incorporate big data analytics.
Pramod Singh (email@example.com) is director of Digital and Big Data Analytics at Hewlett-Packard (HP) and a member of INFORMS. He has a Ph.D. in mathematics from the University of Arkansas and an MBA in marketing. Ritin Mathur (firstname.lastname@example.org) is a senior manager of Big Data Analytics at HP. Srujana H.M. (email@example.com) is a data scientist working on big data technology platforms at HP. All three are based in Bangalore, India.
Notes & References
2. “Big Data Meets Big Data Analytics,” white paper, SAS Institute Inc., 2012.
3. Source: http://www.networkworld.com/article/2289422/applications/9-open-source-big-data-technologies-to-watch.html, last accessed on July 3, 2014.
4. Philip Russom, “Big Data Analytics,” TDWI Best Practices Report, Fourth Quarter, 2011.
5. Thomas H. Davenport and Jill Dyche, “Big Data in Big Companies,” paper, International Institute of Analytics, May 2013.
- 53FEATURES Putin vs. Western analysts Russia’s new approach to extending its influence necessitates new approaches to assessment. By Douglas Samuelson Making analytics work through practical project management Making analytics work: Why consistently delivering value requires effective project management. By Erick Wikum Crowdsourcing – Using the crowd: curated vs. unknown Using…
- 53Many organizations have noticed that the data they own and how they use it can make them different than others to innovate, to compete better and to stay in business. That’s why organizations try to collect and process as much data as possible, transform it into meaningful information with data-driven…
- 51July/August 2014 The story of how IBM not only survived but thrived by realizing business value from big data. By (l-r) Brenda Dietrich, Emily Plachy and Maureen Norton This is the story of how an iconic company founded more than a century ago, and once deemed a “dinosaur” that would…
- 50March/April 2013 By Vijay Mehrotra As described in the previous edition of Analyze This!, I am currently working on a research study with Jeanne Harris at Accenture’s Institute for High Performance. Specifically, we are seeking to develop a quantitative and qualitative understanding of the current state of analytics practice. If…
- 49November/December 2014 Big data needs advanced analytics, but analytics does not need big data. By Eric A. King Thanks big data! Now we’re even more data-rich … yet remain information-poor. After staggering investments motivated by an overabundance of buzz and hype, big data has yet to produce cases that reveal…