Share with your friends


Analytics Magazine

Analytical Journey: Navigating the big data analytics SaaS terrain

July/August 2016

business analytics news and articles

Focus on data, not infrastructure: Three things to look for, three things to avoid.

Brad KolarovBy Brad Kolarov

With the continued hype from so many big data companies, it is hard to understand the best way to start down the big data analytics path. After all, the reason we use big data software tools is to improve our data analytics, not to see if we can get the latest and greatest big data tool to work. We have seen too many “square-peg” solutions pounded into “round-hole” problems.

This article should educate those data consumers focused on solving analytic challenges, those who have yet to start down the path of big data analytics, those who are stuck in the middle of that journey – or the ones who have made it through, but are ready to move to more advanced analytic frameworks.

Navigating the Traditional Terrain

Traditional big data service offerings typically cover a single capability in a range of business needs. Some companies simply make it easier to spin up a Hadoop cluster. Others offer proprietary algorithms to track or uncover patterns in data. Still others provide an aggregation platform for these services and more, all under one roof.

Regardless of your need, one of these types of Software as a Service (SaaS) offerings can help your business get started when it comes to standing up an enterprise-level, cloud-based big data analytics capability, but they consistently fall short in solving your analytic needs.

The path toward big data analytics can take many twists and turns.

The path toward big data analytics can take many twists and turns. Photo Courtesy of | Dirk Ercken

These kinds of distributed processing systems are notoriously hard to system-engineer. They require continual interaction between the IT department, software developers and internal end-user data analysts. These systems could easily add weeks or months to the time it takes for developers to gain access to a Hadoop cluster. (And the larger the cluster, the longer it may take to get from IT to the developers.)

The next generation of big data analytics tools automates these hard-to-system-engineer steps. Through automation, developers can gain access to a Hadoop cluster almost immediately, as opposed to the unwieldy lengths of time it might take through conventional channels.

Next Generation of Big Data Analytics

These new SaaS services have made automating processes almost push-button easy, allowing quintessential infrastructure or analytics models to be built in a self-service environment.

Developers can now go to a website that provides a click-through portal for access to resources they need, based on customized patterns they define. With a few clicks of a mouse, they can have a dedicated space in their cloud, and one or many big data stacks provisioned for them. A few clicks more and they can automatically ingest data and information into the data stacks they’ve created, all with the confidence of cloud-based security to protect sensitive enterprise data.

This way of working clearly facilitates better, faster interaction between developers and IT. That improved interaction in turn makes it easier and faster for data analysts to ingest data and begin gaining critical business insights.

Better still, automation offers a level of resilience and creates more robust big data systems, which are traditionally the most fragile part of an IT environment.

Of course, not all of these new SaaS products are created equal, and it is very difficult to cut through the marketing façade. Users need to be sure that they’ve chosen the right one for their purposes. Below are a few things to keep in mind and a few to avoid when deciding on which platform is right for you.

Three things to look for in big data SaaS:

1. A platform with comprehensive offerings. Most companies need more than just Hadoop and Spark, even if the system can spin up these services in minutes. You should find a SaaS provider that gives you a choice of a broad range of tools with different functions (Kafka, Elasticsearch, Zeppelin, etc.) – but doesn’t make you use them all. This will allow you to fully customize the way your company interacts with data, without having to take on a full load of unnecessary tools. The more options, the more you can do with your data, which is, after all, the point.

2. Systems that automatically ingest data. Provisioning clusters is a relatively easy process and not particularly new to the industry – especially for applications like Hadoop. Once you’ve spun up that cluster, though, you need an equally effective system to bring in your enterprise data and start doing the real analytics work.

3. Transparency in security. The data clusters you build should be in your own environment. Consequently, your infrastructure should be securely hosted in your cloud accounts where you have full access and full accountability for your infrastructure and data.

Three things to avoid:

1. Proprietary, “black box” software. One key advantage to open source is the vast choice and transparency involved in analyzing your data. Some companies may require you to download a full suite of open source or proprietary software to work with your enterprise data. If that works for your enterprise, great. Most companies, however, find that this approach undermines the entire reason of working with open source software in the first place. In general, it’s better to find a provider that allows you to sidestep proprietary software or distributions and launch right away in your cloud.

2. Big data solutions that consume your data. Data is the most sensitive part of the equation when it comes to using a SaaS system with confidence. Make sure that your provider does not consume or escrow your data. And avoid any solutions that may host your data in their own cloud.

3. Bleeding edge. Know the difference between cutting edge and bleeding edge. Some online applications are simply not ready yet for the enterprise, so building a Hadoop cluster on one of these systems may cause more problems than it solves, despite the cool factor of saying you use the technology. Make sure the provider you choose gives you access to open source tools that are widely adopted and well understood by enterprise customers from a security, performance and cost perspective.

Automation holds the key to fast development of enterprise big data capability, and today’s SaaS offerings have many levels of automation. Make sure you pick the system that’s right for your current needs – and can grow with your enterprise as those needs change.

Brad Kolarov is managing partner of Stackspace, a big data technology company that simplifies data analysis for faster business decisions. He can be reached at

business analytics news and articles





Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a $320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →



INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to