Analytical Journey: Navigating the big data analytics SaaS terrain
Focus on data, not infrastructure: Three things to look for, three things to avoid.
By Brad Kolarov
With the continued hype from so many big data companies, it is hard to understand the best way to start down the big data analytics path. After all, the reason we use big data software tools is to improve our data analytics, not to see if we can get the latest and greatest big data tool to work. We have seen too many “square-peg” solutions pounded into “round-hole” problems.
This article should educate those data consumers focused on solving analytic challenges, those who have yet to start down the path of big data analytics, those who are stuck in the middle of that journey – or the ones who have made it through, but are ready to move to more advanced analytic frameworks.
Navigating the Traditional Terrain
Traditional big data service offerings typically cover a single capability in a range of business needs. Some companies simply make it easier to spin up a Hadoop cluster. Others offer proprietary algorithms to track or uncover patterns in data. Still others provide an aggregation platform for these services and more, all under one roof.
Regardless of your need, one of these types of Software as a Service (SaaS) offerings can help your business get started when it comes to standing up an enterprise-level, cloud-based big data analytics capability, but they consistently fall short in solving your analytic needs.
These kinds of distributed processing systems are notoriously hard to system-engineer. They require continual interaction between the IT department, software developers and internal end-user data analysts. These systems could easily add weeks or months to the time it takes for developers to gain access to a Hadoop cluster. (And the larger the cluster, the longer it may take to get from IT to the developers.)
The next generation of big data analytics tools automates these hard-to-system-engineer steps. Through automation, developers can gain access to a Hadoop cluster almost immediately, as opposed to the unwieldy lengths of time it might take through conventional channels.
Next Generation of Big Data Analytics
These new SaaS services have made automating processes almost push-button easy, allowing quintessential infrastructure or analytics models to be built in a self-service environment.
Developers can now go to a website that provides a click-through portal for access to resources they need, based on customized patterns they define. With a few clicks of a mouse, they can have a dedicated space in their cloud, and one or many big data stacks provisioned for them. A few clicks more and they can automatically ingest data and information into the data stacks they’ve created, all with the confidence of cloud-based security to protect sensitive enterprise data.
This way of working clearly facilitates better, faster interaction between developers and IT. That improved interaction in turn makes it easier and faster for data analysts to ingest data and begin gaining critical business insights.
Better still, automation offers a level of resilience and creates more robust big data systems, which are traditionally the most fragile part of an IT environment.
Of course, not all of these new SaaS products are created equal, and it is very difficult to cut through the marketing façade. Users need to be sure that they’ve chosen the right one for their purposes. Below are a few things to keep in mind and a few to avoid when deciding on which platform is right for you.
Three things to look for in big data SaaS:
1. A platform with comprehensive offerings. Most companies need more than just Hadoop and Spark, even if the system can spin up these services in minutes. You should find a SaaS provider that gives you a choice of a broad range of tools with different functions (Kafka, Elasticsearch, Zeppelin, etc.) – but doesn’t make you use them all. This will allow you to fully customize the way your company interacts with data, without having to take on a full load of unnecessary tools. The more options, the more you can do with your data, which is, after all, the point.
2. Systems that automatically ingest data. Provisioning clusters is a relatively easy process and not particularly new to the industry – especially for applications like Hadoop. Once you’ve spun up that cluster, though, you need an equally effective system to bring in your enterprise data and start doing the real analytics work.
3. Transparency in security. The data clusters you build should be in your own environment. Consequently, your infrastructure should be securely hosted in your cloud accounts where you have full access and full accountability for your infrastructure and data.
Three things to avoid:
1. Proprietary, “black box” software. One key advantage to open source is the vast choice and transparency involved in analyzing your data. Some companies may require you to download a full suite of open source or proprietary software to work with your enterprise data. If that works for your enterprise, great. Most companies, however, find that this approach undermines the entire reason of working with open source software in the first place. In general, it’s better to find a provider that allows you to sidestep proprietary software or distributions and launch right away in your cloud.
2. Big data solutions that consume your data. Data is the most sensitive part of the equation when it comes to using a SaaS system with confidence. Make sure that your provider does not consume or escrow your data. And avoid any solutions that may host your data in their own cloud.
3. Bleeding edge. Know the difference between cutting edge and bleeding edge. Some online applications are simply not ready yet for the enterprise, so building a Hadoop cluster on one of these systems may cause more problems than it solves, despite the cool factor of saying you use the technology. Make sure the provider you choose gives you access to open source tools that are widely adopted and well understood by enterprise customers from a security, performance and cost perspective.
Automation holds the key to fast development of enterprise big data capability, and today’s SaaS offerings have many levels of automation. Make sure you pick the system that’s right for your current needs – and can grow with your enterprise as those needs change.
Brad Kolarov is managing partner of Stackspace, a big data technology company that simplifies data analysis for faster business decisions. He can be reached at firstname.lastname@example.org.