Share with your friends










Submit

Analytics Magazine

Autonomous Systems: Big data in your products, services and operations

Businesses should equip their team with data analysis tools to easily access and aggregate big data sets. Photo Courtesy of 123rf.com | © zhudifengi

Businesses should equip their team with data analysis tools to easily access and aggregate big data sets.
Photo Courtesy of 123rf.com | © zhudifengi

What data scientists and engineers need to know when working with big data as they move from “conceptualization” to “operationalization” of their designs.

Dave OswillBy Dave Oswill

Businesses are greatly expanding the autonomous capabilities of their products, services and manufacturing processes to better optimize their reliability and efficiency. The processing of big data is playing an integral role in developing these prescriptive analytics.

As a result, data scientists and engineers should pay attention to the following aspects of working with big data as they move from “conceptualization” to “operationalization” of their designs:

  • accessing data stored in various formats and systems
  • finding and deriving relevant information in data
  • using tools that scale to big data for both development and operationalization

By remaining mindful of when, where and how these challenges arise during the big data design process, data scientists and engineers will be better able to complete their projects on time and on budget.

Aggregating Disparate Data Sets

One of the first steps in the development of an automated system is to select a scalable tool that can easily provide access to a wide variety of systems and formats used to store and manage big data sets. Data is often scattered, making it time-consuming to collect and categorize. For example, sensor or image data stored in files on a shared drive may need to be combined with metadata stored in SQL or NoSQL databases. Data may also reside in large-scale distributed storage and processing frameworks such as Hadoop and Spark. In other cases, data in disparate forms (delimited text, spreadsheets, images, videos and proprietary formats) must be used together in order to understand the behavior of the system and develop a predictive model. Businesses should look to equip their team with data analysis tools that provide a platform and workspace where engineers and scientists can easily access and aggregate big data sets.

Understanding What’s in Your Data

After the data is collected and aggregated, data scientists and engineers must interpret and transform that data into some form of actionable insight. Although any number of interpretive methods can be used, several broad techniques make it easier for engineers to summarize variables in a data set and uncover meaningful trends:

Summary visualizations, such as binned scatter plots, provide a way to easily view patterns and trends within large data sets. These plots highlight areas where data points are more highly concentrated and then use a slide control to adjust color intensity, which lets the designer interactively explore large data sets to quickly gain insights.

Filtering and other signal processing techniques not only enable developers to detect slow-moving trends or infrequent events spread across data that are important to take into account in the theory or model, but they also enable developers to derive additional information from a set of data for use in predictive models or algorithms.

Programmatically enabled data cleansing allows bad or missing data to be fixed before a valid model or theory is established, and it allows the same data-cleansing algorithm to be deployed in a production application, service or product.

Feature selection techniques help developers find the data that is most relevant for the theory or model, enabling a more accurate and compact implementation of predictive models or algorithms.

Working with Large-Scale Data

Data processing at scale is another crucial consideration in the design of automated systems. Although many data scientists and engineers are most efficient when working on a familiar workstation, data sets often are too large to be stored locally and require a level of software analysis, modeling and algorithm development that only a cluster-based computing platform can handle. Modeling tools that allow developers to easily move between systems without changing code greatly increase design efficiency.

Data scientists and engineers should look for a scalable data analysis and modeling tool that builds in enough domain-specific features to allow them to conveniently access data and easily work with it using familiar syntaxes and functions. By providing tools the domain expert commonly uses with easy-to-use machine learning functionality, engineers can combine their domain knowledge with the tools of the data scientist, allowing them to make more effective design decisions, quickly deploy their models, and test and validate the accuracy of any given model.

Once a data scientist or engineer has walked through the process and associated challenges of designing a big data system, a final consideration must be assessed: the ability to rapidly operationalize predictive models and algorithms for enterprise-scale applications.

There are scalable data analysis and modeling tools available on the market that can provide product development teams with the domain-specific tools they need. With these tools, engineers and scientists can rapidly develop and integrate algorithms into their automated and embedded systems without the need to manually recode in another language.

By anticipating these aspects of working with big data, data scientists and engineers will be better able to integrate automated systems into their project chains in order to more quickly adapt to changing environmental and business conditions and address market needs more effectively. y

Dave Oswill is the product marketing manager at MathWorks, where he works with customers in developing and deploying analytics along with the wide variety of data management and business application technologies in use today.

Analytics data science news articles

Related Posts

  • 100
    With the rise of big data – and the processes and tools related to utilizing and managing large data sets – organizations are recognizing the value of data as a critical business asset to identify trends, patterns and preferences to drive improved customer experiences and competitive advantage. The problem is,…
    Tags: data
  • 100
    Today, we live in a digital society. Our distinct footprints are in every interaction we make. Data generation is a default – be it from enterprise operational systems, logs from web servers, other applications, social interactions and transactions, research initiatives and connected things (Internet of Things). In fact, according to…
    Tags: data
  • 100
    The Internet of Things (IoT) is considered to be the next revolution that touches every part of our daily life, from restocking ice cream to warning of pollutants. Analytics professionals understand the importance of data, especially in a complicated field such as healthcare. This article offers a framework on integrating…
    Tags: data
  • 100
    Frontline Systems releases Analytic Solver V2018 for Excel Frontline Systems, developer of the Solver in Microsoft Excel, recently released Analytic Solver V2018, its full product line of predictive and prescriptive analytics tools that work in Microsoft Excel. The new release includes a visual editor for multi-stage “data science workflows” (also…
    Tags: data
  • 87
    Thousands of companies all over the world are competing for a finite number of data scientists, paying them big bucks to join their organizations – and setting them up for failure.
    Tags: data

Headlines

Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a $320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →

UPCOMING ANALYTICS EVENTS

INFORMS-SPONSORED EVENTS

INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.

OTHER EVENTS

Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)

CAP® EXAM SCHEDULE

CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:


 
For more information, go to 
https://www.certifiedanalytics.org.