Share with your friends


Analytics Magazine

Data Analysis & Modeling Tools: Leveraging big data for better business decisions

Dave Oswill, MathWorksBy Dave Oswill

Working with big data is fast becoming a key step in the process of scientific discovery and engineering. This is happening as technologies such as smart sensors and the Internet of Things (IoT) are enabling vast amounts of detailed data to be collected from scientific instruments, manufacturing systems, connected cars, aircraft and other systems.

Significant value lies in this data as it may show important physical phenomena or provide information on the operating environment, efficiency and health of a system. With the proper tools and techniques, this data can be used for rapid scientific discoveries and to incorporate more intelligence into products, services and manufacturing processes.

For many engineers and scientists, who must consider implementing these data-driven solutions into their enterprise, this can be a daunting process due to the systems commonly used to store, manage and process this valuable data. Software analysis and data modeling tools have been enhanced with new capabilities that allow engineers and scientists to use familiar syntax and functions to unlock the complexity of the data they are collecting to make more effective design and business decisions.

Leveraging big data

The first challenge in working with big data is gaining access to large data sets that may be stored in various types of systems.
Photo Courtesy of | illustrator

Accessing Large Sets of Data

Big data, Internet of Things (IoT), scientific discovery, better business decisions, large data sets, data analysis, modeling tools

Figure 1: Access a wide range of big data.
Source: The MathWorks, Inc.

The first challenge in working with big data is gaining access to these large data sets that may be stored in various types of systems ranging from shared file systems, databases (SQL/NoSQL), IoT data aggregators and data historians, to distributed platforms such as Hadoop. The data may consist of delimited text, spreadsheets, images, videos and other proprietary formats.

To effectively work with this data, engineers and scientists need scalable tools, such as MATLAB, that can provide access to a variety of systems and data formats. This is especially crucial in cases where more than one type of big data platform or data format may be in use.

Exploring and Processing Large Sets of Data

After accessing the data and before creating a model or theory, it’s important to understand what is in the data, as it may have a major impact on the final result. There are certain capabilities that simplify this exploration process, making it easier for engineers and scientists to observe, clean and effectively work with big data, including:

Summary visualizations, such as binScatterPlot (Figure 2) provide a way to easily view patterns and quickly gain insights.

big data visualization for business analytics

Figure 2: binScatterPlot in MATLAB.
Source: The MathWorks, Inc.

Data cleansing removes outliers and replaces bad or missing data to ensure a better model or analysis. A programmatic way to cleanse data enables new data to be automatically cleaned as it’s collected. (Figure 3).

Figure 3: The two types of machine learning methods provide different algorithms tailored for different problems. Source: The MathWorks, Inc.

Figure 3: The two types of machine learning methods provide different algorithms tailored for different problems.
Source: The MathWorks, Inc.

Data reduction techniques such as principal component analysis (PCA) help to find the most influential of your data inputs. By reducing the number of inputs, a more compact model can be created, which requires less processing when the model is embedded into the products or services.

Data processing at scale enables engineers and scientists to not only work with large sets of data on a desktop workstation, but use their analysis pipeline or algorithms on an enterprise class system such as Hadoop. The ability to move between systems without changing code greatly increases efficiency.

Creating a Model

Imagine collecting years’ worth of data. What is valuable in this data? Often, in order to analyze the data and create an intelligent and predictive model, machine learning is required.

Machine learning uses computational methods to “learn” information directly from data without relying on a predetermined equation as a model. It turns out this ability to train models using the data itself opens up many use cases for predictive modeling such as predictive health for complex machinery and systems, physical and natural behaviors, energy load forecasting and financial credit scoring.

Machine learning is broadly divided into two types of methods, supervised and unsupervised learning, each of which contains several algorithms tailored for different problems.

  • Supervised learning uses a training data set which maps input data to previously known response values.
  • Unsupervised learning draws inferences from data sets with input data that does not map to a known output response.

Incorporating Big Data for Real-World Solutions

There are a number of platforms available to IT organizations for storing and processing of big data that fall into two categories: 1) batch processing of large, historical sets of data, and 2) real-time or near real-time processing of data that is continuously collected from devices

Figure 4: Integrating models with MATLAB. Source: The MathWorks, Inc.

Figure 4: Integrating models with MATLAB.
Source: The MathWorks, Inc.

Batch applications, such as Spark or MapReduce, are commonly used to analyze and process historical data that has been collected over long periods of time or across many different devices or systems. These applications are typically used to look for trends in data and develop predictive models.

Streaming applications that process in real- or near-real time, such as Kafka, may be coupled with a predictive model to add more intelligence and adaptive capabilities to a product or service such as predictive maintenance, optimizing equipment fleets, and monitoring manufacturing lines.

Incorporating models into products or services is typically done in conjunction with enterprise application developers and system architects, but this can create a challenge. Developing models in traditional programming languages is difficult for engineers and scientists, while recoding models can be time-consuming and error prone, especially if the models require periodic updates.

To alleviate this issue, enterprise application developers should look for data analysis and modeling tools that are familiar to their engineers and scientists, while also providing production ready tooling such as application servers and code generation for deploying models into their applications, products and services.

To truly take advantage of the value of big data, the full process – from sourcing data to developing analytical models to deploying these models into production – must be supported. IT managers and solution architects can use modeling tools to enable the scientists and engineers in their organizations to develop algorithms and models for smarter and differentiated products and services. Simultaneously, the organization is being enabled to rapidly incorporate these models into its products and services by leveraging production-ready application servers and code generation capabilities that are found in these tools.

The combination of a knowledgeable domain expert who has been enabled to be an effective data scientist, along with an IT team capable of rapidly incorporating their work into the services, products and operations of their organization, makes for a significant competitive advantage when offering the products and services that customers are demanding.

Dave Oswill is product marketing manager at MathWorks, developers of MATLAB.

Analytics data science news articles



Related Posts

  • 34
    FEATURES Welcome to ‘worksocial’ world By Samir Gulati New approach, technology blends data, process and collaboration for better, faster decision-making. How to pick a business partner By David Zakkam and Deepinder Singh Dhingra Ten things to consider when evaluating analytics and decision sciences partners. Big data, analytics and elections By…
    Tags: analytics, data, analysis, business, better, things, modeling, big
  • 34
    A quick quiz: What is a good nine- or 10-letter description of the emerging interest in business analytics and big data that ends in “-al”? A choice that may come to mind for many is “hysterical.” This choice reflects frenzied excitement about opportunities for business analytics to solve problems often…
    Tags: analytics, data, business, intelligence, big
  • 34
    FEATURES Fulfilling the promise of analytics By Chris Mazzei Strategy, leadership and consumption: The keys to getting the most from big data and analytics focus on the human element. How to get the most out of data lakes By Sean Martin A handful of requisite business skills that facilitate self-service…
    Tags: analytics, data, big, business, things, internet, iot, decisions
  • 33
    Car dashboards are simple visual indicators of a complex machine with many parts that performs a high-stakes task in a context of many overlapping, conflicting rules and goals: personal convenience, safety, minimum travel time, courtesy to other drivers and so on. The speedometer, perhaps one of the most important indicators,…
    Tags: business, data, analytics, intelligence, big
  • 31
    Ducati Corse, the racing department of Ducati Motor Holding, a world leader in sports motorcycle manufacturing, is working with Accenture to integrate the Internet of Things (IoT) and artificial intelligence technologies into the testing of its MotoGP racing bikes. Ducati Corse wants to make testing its race bikes faster, cheaper…
    Tags: data, iot, things, internet, intelligence, analytics


Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a $320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →



INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to