Share with your friends


Analytics Magazine

Analytical Journey: Navigating the big data analytics SaaS terrain

July/August 2016

business analytics news and articles

Focus on data, not infrastructure: Three things to look for, three things to avoid.

Brad KolarovBy Brad Kolarov

With the continued hype from so many big data companies, it is hard to understand the best way to start down the big data analytics path. After all, the reason we use big data software tools is to improve our data analytics, not to see if we can get the latest and greatest big data tool to work. We have seen too many “square-peg” solutions pounded into “round-hole” problems.

This article should educate those data consumers focused on solving analytic challenges, those who have yet to start down the path of big data analytics, those who are stuck in the middle of that journey – or the ones who have made it through, but are ready to move to more advanced analytic frameworks.

Navigating the Traditional Terrain

Traditional big data service offerings typically cover a single capability in a range of business needs. Some companies simply make it easier to spin up a Hadoop cluster. Others offer proprietary algorithms to track or uncover patterns in data. Still others provide an aggregation platform for these services and more, all under one roof.

Regardless of your need, one of these types of Software as a Service (SaaS) offerings can help your business get started when it comes to standing up an enterprise-level, cloud-based big data analytics capability, but they consistently fall short in solving your analytic needs.

The path toward big data analytics can take many twists and turns.

The path toward big data analytics can take many twists and turns. Photo Courtesy of | Dirk Ercken

These kinds of distributed processing systems are notoriously hard to system-engineer. They require continual interaction between the IT department, software developers and internal end-user data analysts. These systems could easily add weeks or months to the time it takes for developers to gain access to a Hadoop cluster. (And the larger the cluster, the longer it may take to get from IT to the developers.)

The next generation of big data analytics tools automates these hard-to-system-engineer steps. Through automation, developers can gain access to a Hadoop cluster almost immediately, as opposed to the unwieldy lengths of time it might take through conventional channels.

Next Generation of Big Data Analytics

These new SaaS services have made automating processes almost push-button easy, allowing quintessential infrastructure or analytics models to be built in a self-service environment.

Developers can now go to a website that provides a click-through portal for access to resources they need, based on customized patterns they define. With a few clicks of a mouse, they can have a dedicated space in their cloud, and one or many big data stacks provisioned for them. A few clicks more and they can automatically ingest data and information into the data stacks they’ve created, all with the confidence of cloud-based security to protect sensitive enterprise data.

This way of working clearly facilitates better, faster interaction between developers and IT. That improved interaction in turn makes it easier and faster for data analysts to ingest data and begin gaining critical business insights.

Better still, automation offers a level of resilience and creates more robust big data systems, which are traditionally the most fragile part of an IT environment.

Of course, not all of these new SaaS products are created equal, and it is very difficult to cut through the marketing façade. Users need to be sure that they’ve chosen the right one for their purposes. Below are a few things to keep in mind and a few to avoid when deciding on which platform is right for you.

Three things to look for in big data SaaS:

1. A platform with comprehensive offerings. Most companies need more than just Hadoop and Spark, even if the system can spin up these services in minutes. You should find a SaaS provider that gives you a choice of a broad range of tools with different functions (Kafka, Elasticsearch, Zeppelin, etc.) – but doesn’t make you use them all. This will allow you to fully customize the way your company interacts with data, without having to take on a full load of unnecessary tools. The more options, the more you can do with your data, which is, after all, the point.

2. Systems that automatically ingest data. Provisioning clusters is a relatively easy process and not particularly new to the industry – especially for applications like Hadoop. Once you’ve spun up that cluster, though, you need an equally effective system to bring in your enterprise data and start doing the real analytics work.

3. Transparency in security. The data clusters you build should be in your own environment. Consequently, your infrastructure should be securely hosted in your cloud accounts where you have full access and full accountability for your infrastructure and data.

Three things to avoid:

1. Proprietary, “black box” software. One key advantage to open source is the vast choice and transparency involved in analyzing your data. Some companies may require you to download a full suite of open source or proprietary software to work with your enterprise data. If that works for your enterprise, great. Most companies, however, find that this approach undermines the entire reason of working with open source software in the first place. In general, it’s better to find a provider that allows you to sidestep proprietary software or distributions and launch right away in your cloud.

2. Big data solutions that consume your data. Data is the most sensitive part of the equation when it comes to using a SaaS system with confidence. Make sure that your provider does not consume or escrow your data. And avoid any solutions that may host your data in their own cloud.

3. Bleeding edge. Know the difference between cutting edge and bleeding edge. Some online applications are simply not ready yet for the enterprise, so building a Hadoop cluster on one of these systems may cause more problems than it solves, despite the cool factor of saying you use the technology. Make sure the provider you choose gives you access to open source tools that are widely adopted and well understood by enterprise customers from a security, performance and cost perspective.

Automation holds the key to fast development of enterprise big data capability, and today’s SaaS offerings have many levels of automation. Make sure you pick the system that’s right for your current needs – and can grow with your enterprise as those needs change.

Brad Kolarov is managing partner of Stackspace, a big data technology company that simplifies data analysis for faster business decisions. He can be reached at

business analytics news and articles





Fighting terrorists online: Identifying extremists before they post content

New research has found a way to identify extremists, such as those associated with the terrorist group ISIS, by monitoring their social media accounts, and can identify them even before they post threatening content. The research, “Finding Extremists in Online Social Networks,” which was recently published in the INFORMS journal Operations Research, was conducted by Tauhid Zaman of the MIT, Lt. Col. Christopher E. Marks of the U.S. Army and Jytte Klausen of Brandeis University. Read more →

Syrian conflict yields model for attrition dynamics in multilateral war

Based on their study of the Syrian Civil War that’s been raging since 2011, three researchers created a predictive model for multilateral war called the Lanchester multiduel. Unless there is a player so strong it can guarantee a win regardless of what others do, the likely outcome of multilateral war is a gradual stalemate that culminates in the mutual annihilation of all players, according to the model. Read more →

SAS, Samford University team up to generate sports analytics talent

Sports teams try to squeeze out every last bit of talent to gain a competitive advantage on the field. That’s also true in college athletic departments and professional team offices, where entire departments devoted to analyzing data hunt for sports analytics experts that can give them an edge in a game, in the stands and beyond. To create this talent, analytics company SAS will collaborate with the Samford University Center for Sports Analytics to support teaching, learning and research in all areas where analytics affects sports, including fan engagement, sponsorship, player tracking, sports medicine, sports media and operations. Read more →



INFORMS Annual Meeting
Nov. 4-7, 2018, Phoenix

Winter Simulation Conference
Dec. 9-12, 2018, Gothenburg, Sweden


Making Data Science Pay
Oct. 29 -30, 12 p.m.-5 p.m.

Applied AI & Machine Learning | Comprehensive
Starts Oct. 29, 2018 (live online)

The Analytics Clinic
Citizen Data Scientists | Why Not DIY AI?
Nov. 8, 2018, 11 a.m. – 12:30 p.m.

Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to