Share with your friends


Analytics Magazine

Autonomous Systems: Big data in your products, services and operations

Businesses should equip their team with data analysis tools to easily access and aggregate big data sets. Photo Courtesy of | © zhudifengi

Businesses should equip their team with data analysis tools to easily access and aggregate big data sets.
Photo Courtesy of | © zhudifengi

What data scientists and engineers need to know when working with big data as they move from “conceptualization” to “operationalization” of their designs.

Dave OswillBy Dave Oswill

Businesses are greatly expanding the autonomous capabilities of their products, services and manufacturing processes to better optimize their reliability and efficiency. The processing of big data is playing an integral role in developing these prescriptive analytics.

As a result, data scientists and engineers should pay attention to the following aspects of working with big data as they move from “conceptualization” to “operationalization” of their designs:

  • accessing data stored in various formats and systems
  • finding and deriving relevant information in data
  • using tools that scale to big data for both development and operationalization

By remaining mindful of when, where and how these challenges arise during the big data design process, data scientists and engineers will be better able to complete their projects on time and on budget.

Aggregating Disparate Data Sets

One of the first steps in the development of an automated system is to select a scalable tool that can easily provide access to a wide variety of systems and formats used to store and manage big data sets. Data is often scattered, making it time-consuming to collect and categorize. For example, sensor or image data stored in files on a shared drive may need to be combined with metadata stored in SQL or NoSQL databases. Data may also reside in large-scale distributed storage and processing frameworks such as Hadoop and Spark. In other cases, data in disparate forms (delimited text, spreadsheets, images, videos and proprietary formats) must be used together in order to understand the behavior of the system and develop a predictive model. Businesses should look to equip their team with data analysis tools that provide a platform and workspace where engineers and scientists can easily access and aggregate big data sets.

Understanding What’s in Your Data

After the data is collected and aggregated, data scientists and engineers must interpret and transform that data into some form of actionable insight. Although any number of interpretive methods can be used, several broad techniques make it easier for engineers to summarize variables in a data set and uncover meaningful trends:

Summary visualizations, such as binned scatter plots, provide a way to easily view patterns and trends within large data sets. These plots highlight areas where data points are more highly concentrated and then use a slide control to adjust color intensity, which lets the designer interactively explore large data sets to quickly gain insights.

Filtering and other signal processing techniques not only enable developers to detect slow-moving trends or infrequent events spread across data that are important to take into account in the theory or model, but they also enable developers to derive additional information from a set of data for use in predictive models or algorithms.

Programmatically enabled data cleansing allows bad or missing data to be fixed before a valid model or theory is established, and it allows the same data-cleansing algorithm to be deployed in a production application, service or product.

Feature selection techniques help developers find the data that is most relevant for the theory or model, enabling a more accurate and compact implementation of predictive models or algorithms.

Working with Large-Scale Data

Data processing at scale is another crucial consideration in the design of automated systems. Although many data scientists and engineers are most efficient when working on a familiar workstation, data sets often are too large to be stored locally and require a level of software analysis, modeling and algorithm development that only a cluster-based computing platform can handle. Modeling tools that allow developers to easily move between systems without changing code greatly increase design efficiency.

Data scientists and engineers should look for a scalable data analysis and modeling tool that builds in enough domain-specific features to allow them to conveniently access data and easily work with it using familiar syntaxes and functions. By providing tools the domain expert commonly uses with easy-to-use machine learning functionality, engineers can combine their domain knowledge with the tools of the data scientist, allowing them to make more effective design decisions, quickly deploy their models, and test and validate the accuracy of any given model.

Once a data scientist or engineer has walked through the process and associated challenges of designing a big data system, a final consideration must be assessed: the ability to rapidly operationalize predictive models and algorithms for enterprise-scale applications.

There are scalable data analysis and modeling tools available on the market that can provide product development teams with the domain-specific tools they need. With these tools, engineers and scientists can rapidly develop and integrate algorithms into their automated and embedded systems without the need to manually recode in another language.

By anticipating these aspects of working with big data, data scientists and engineers will be better able to integrate automated systems into their project chains in order to more quickly adapt to changing environmental and business conditions and address market needs more effectively. y

Dave Oswill is the product marketing manager at MathWorks, where he works with customers in developing and deploying analytics along with the wide variety of data management and business application technologies in use today.

Analytics data science news articles

Related Posts

  • 100
    The Internet of Things (IoT) is considered to be the next revolution that touches every part of our daily life, from restocking ice cream to warning of pollutants. Analytics professionals understand the importance of data, especially in a complicated field such as healthcare. This article offers a framework on integrating…
    Tags: data
  • 100
    With the rise of big data – and the processes and tools related to utilizing and managing large data sets – organizations are recognizing the value of data as a critical business asset to identify trends, patterns and preferences to drive improved customer experiences and competitive advantage. The problem is,…
    Tags: data
  • 100
    Today, we live in a digital society. Our distinct footprints are in every interaction we make. Data generation is a default – be it from enterprise operational systems, logs from web servers, other applications, social interactions and transactions, research initiatives and connected things (Internet of Things). In fact, according to…
    Tags: data
  • 100
    Frontline Systems releases Analytic Solver V2018 for Excel Frontline Systems, developer of the Solver in Microsoft Excel, recently released Analytic Solver V2018, its full product line of predictive and prescriptive analytics tools that work in Microsoft Excel. The new release includes a visual editor for multi-stage “data science workflows” (also…
    Tags: data
  • 81
    Nearly 40 percent of data professionals spend more than 20 hours per week accessing, blending and preparing data rather than performing actual analysis, according to a survey conducted by TMMData and the Digital Analytics Association. More than 800 DAA community members participated in the survey held earlier this year. The…
    Tags: data


Artificial intelligence a game changer for personal devices

Emotion artificial intelligence (AI) systems are becoming so sophisticated that Gartner, Inc. predicts that by 2022, personal devices will know more about an individual’s emotional state than his or her own family. AI is generating multiple disruptive forces that are reshaping the way we interact with personal technologies. Read more →

FICO predicts AI and blockchain will meet in 2018

Scott Zoldi, chief analytics officer at FICO, in his AI predictions for 2018, predicts the rise of “defensive AI” and manipulative chatbots. Blockchain will use AI to search through relationship data, Zoldi says, and defensive AI will be used to protect systems from malicious AI and machine learning. Adds Zoldi: Chatbots will get so good at understanding us they will learn how to manipulate us. Read more →



2018 INFORMS Conference on Business Analytics and Operations Research
April 15-17, 2018, Baltimore


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to