Share with your friends










Submit

Analytics Magazine

Autonomous Systems: Big data in your products, services and operations

Businesses should equip their team with data analysis tools to easily access and aggregate big data sets. Photo Courtesy of 123rf.com | © zhudifengi

Businesses should equip their team with data analysis tools to easily access and aggregate big data sets.
Photo Courtesy of 123rf.com | © zhudifengi

What data scientists and engineers need to know when working with big data as they move from “conceptualization” to “operationalization” of their designs.

Dave OswillBy Dave Oswill

Businesses are greatly expanding the autonomous capabilities of their products, services and manufacturing processes to better optimize their reliability and efficiency. The processing of big data is playing an integral role in developing these prescriptive analytics.

As a result, data scientists and engineers should pay attention to the following aspects of working with big data as they move from “conceptualization” to “operationalization” of their designs:

  • accessing data stored in various formats and systems
  • finding and deriving relevant information in data
  • using tools that scale to big data for both development and operationalization

By remaining mindful of when, where and how these challenges arise during the big data design process, data scientists and engineers will be better able to complete their projects on time and on budget.

Aggregating Disparate Data Sets

One of the first steps in the development of an automated system is to select a scalable tool that can easily provide access to a wide variety of systems and formats used to store and manage big data sets. Data is often scattered, making it time-consuming to collect and categorize. For example, sensor or image data stored in files on a shared drive may need to be combined with metadata stored in SQL or NoSQL databases. Data may also reside in large-scale distributed storage and processing frameworks such as Hadoop and Spark. In other cases, data in disparate forms (delimited text, spreadsheets, images, videos and proprietary formats) must be used together in order to understand the behavior of the system and develop a predictive model. Businesses should look to equip their team with data analysis tools that provide a platform and workspace where engineers and scientists can easily access and aggregate big data sets.

Understanding What’s in Your Data

After the data is collected and aggregated, data scientists and engineers must interpret and transform that data into some form of actionable insight. Although any number of interpretive methods can be used, several broad techniques make it easier for engineers to summarize variables in a data set and uncover meaningful trends:

Summary visualizations, such as binned scatter plots, provide a way to easily view patterns and trends within large data sets. These plots highlight areas where data points are more highly concentrated and then use a slide control to adjust color intensity, which lets the designer interactively explore large data sets to quickly gain insights.

Filtering and other signal processing techniques not only enable developers to detect slow-moving trends or infrequent events spread across data that are important to take into account in the theory or model, but they also enable developers to derive additional information from a set of data for use in predictive models or algorithms.

Programmatically enabled data cleansing allows bad or missing data to be fixed before a valid model or theory is established, and it allows the same data-cleansing algorithm to be deployed in a production application, service or product.

Feature selection techniques help developers find the data that is most relevant for the theory or model, enabling a more accurate and compact implementation of predictive models or algorithms.

Working with Large-Scale Data

Data processing at scale is another crucial consideration in the design of automated systems. Although many data scientists and engineers are most efficient when working on a familiar workstation, data sets often are too large to be stored locally and require a level of software analysis, modeling and algorithm development that only a cluster-based computing platform can handle. Modeling tools that allow developers to easily move between systems without changing code greatly increase design efficiency.

Data scientists and engineers should look for a scalable data analysis and modeling tool that builds in enough domain-specific features to allow them to conveniently access data and easily work with it using familiar syntaxes and functions. By providing tools the domain expert commonly uses with easy-to-use machine learning functionality, engineers can combine their domain knowledge with the tools of the data scientist, allowing them to make more effective design decisions, quickly deploy their models, and test and validate the accuracy of any given model.

Once a data scientist or engineer has walked through the process and associated challenges of designing a big data system, a final consideration must be assessed: the ability to rapidly operationalize predictive models and algorithms for enterprise-scale applications.

There are scalable data analysis and modeling tools available on the market that can provide product development teams with the domain-specific tools they need. With these tools, engineers and scientists can rapidly develop and integrate algorithms into their automated and embedded systems without the need to manually recode in another language.

By anticipating these aspects of working with big data, data scientists and engineers will be better able to integrate automated systems into their project chains in order to more quickly adapt to changing environmental and business conditions and address market needs more effectively. y

Dave Oswill is the product marketing manager at MathWorks, where he works with customers in developing and deploying analytics along with the wide variety of data management and business application technologies in use today.

Analytics data science news articles

Related Posts

  • 100
    The Internet of Things (IoT) is considered to be the next revolution that touches every part of our daily life, from restocking ice cream to warning of pollutants. Analytics professionals understand the importance of data, especially in a complicated field such as healthcare. This article offers a framework on integrating…
    Tags: data
  • 100
    Today, we live in a digital society. Our distinct footprints are in every interaction we make. Data generation is a default – be it from enterprise operational systems, logs from web servers, other applications, social interactions and transactions, research initiatives and connected things (Internet of Things). In fact, according to…
    Tags: data
  • 100
    Frontline Systems releases Analytic Solver V2018 for Excel Frontline Systems, developer of the Solver in Microsoft Excel, recently released Analytic Solver V2018, its full product line of predictive and prescriptive analytics tools that work in Microsoft Excel. The new release includes a visual editor for multi-stage “data science workflows” (also…
    Tags: data
  • 100
    With the rise of big data – and the processes and tools related to utilizing and managing large data sets – organizations are recognizing the value of data as a critical business asset to identify trends, patterns and preferences to drive improved customer experiences and competitive advantage. The problem is,…
    Tags: data
  • 87
    Thousands of companies all over the world are competing for a finite number of data scientists, paying them big bucks to join their organizations – and setting them up for failure.
    Tags: data




Headlines

Finalists for 2018 Syngenta Crop Challenge announced

Syngenta and the Analytics Society of INFORMS this week announced the finalists for the 2018 Syngenta Crop Challenge in Analytics. Now in its third year, the competition aims to address the challenge of achieving global food security by fostering cross-industry collaboration between agriculture and advanced analytics experts. Read more →

International Conference to spotlight O.R., analytics and AI

The 2018 INFORMS International Conference will be held in the world-class International Convention Center (TICC) and the Grand Hyatt Taipei in beautiful Taipei City, Taiwan, on June 17-20. The conference theme is “A Better World Through Operations Research, Analytics and Artificial Intelligence.” The conference offers excellent opportunities to learn about the emerging O.R. and A.I. technologies and applications from more than 40 invited, sponsored and contributed clusters featuring more than 600 talks. Read more →

How math education can catch up to the 21st century

Big data analytics is increasingly becoming a trending practice that society is adopting, and Daniel Kunin, a scholar at Stanford University and the creator of the online platform, Seeing Theory, is using creative and innovative ways to teach statistics and probability relevant to a changing world. In a recent interview with CMRubinWorld, Kunin discusses math education today and how he believes it can be improved in order to both foster curiosity and be more relevant now and going forward. Read more →

UPCOMING ANALYTICS EVENTS

INFORMS-SPONSORED EVENTS

2018 INFORMS Conference on Business Analytics and Operations Research
April 15-17, 2018, Baltimore

CAP® EXAM SCHEDULE

CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:


 
For more information, go to 
https://www.certifiedanalytics.org.