Share with your friends


Analytics Magazine

Data Analysis & Modeling Tools: Leveraging big data for better business decisions

Dave Oswill, MathWorksBy Dave Oswill

Working with big data is fast becoming a key step in the process of scientific discovery and engineering. This is happening as technologies such as smart sensors and the Internet of Things (IoT) are enabling vast amounts of detailed data to be collected from scientific instruments, manufacturing systems, connected cars, aircraft and other systems.

Significant value lies in this data as it may show important physical phenomena or provide information on the operating environment, efficiency and health of a system. With the proper tools and techniques, this data can be used for rapid scientific discoveries and to incorporate more intelligence into products, services and manufacturing processes.

For many engineers and scientists, who must consider implementing these data-driven solutions into their enterprise, this can be a daunting process due to the systems commonly used to store, manage and process this valuable data. Software analysis and data modeling tools have been enhanced with new capabilities that allow engineers and scientists to use familiar syntax and functions to unlock the complexity of the data they are collecting to make more effective design and business decisions.

Leveraging big data

The first challenge in working with big data is gaining access to large data sets that may be stored in various types of systems.
Photo Courtesy of | illustrator

Accessing Large Sets of Data

Big data, Internet of Things (IoT), scientific discovery, better business decisions, large data sets, data analysis, modeling tools

Figure 1: Access a wide range of big data.
Source: The MathWorks, Inc.

The first challenge in working with big data is gaining access to these large data sets that may be stored in various types of systems ranging from shared file systems, databases (SQL/NoSQL), IoT data aggregators and data historians, to distributed platforms such as Hadoop. The data may consist of delimited text, spreadsheets, images, videos and other proprietary formats.

To effectively work with this data, engineers and scientists need scalable tools, such as MATLAB, that can provide access to a variety of systems and data formats. This is especially crucial in cases where more than one type of big data platform or data format may be in use.

Exploring and Processing Large Sets of Data

After accessing the data and before creating a model or theory, it’s important to understand what is in the data, as it may have a major impact on the final result. There are certain capabilities that simplify this exploration process, making it easier for engineers and scientists to observe, clean and effectively work with big data, including:

Summary visualizations, such as binScatterPlot (Figure 2) provide a way to easily view patterns and quickly gain insights.

big data visualization for business analytics

Figure 2: binScatterPlot in MATLAB.
Source: The MathWorks, Inc.

Data cleansing removes outliers and replaces bad or missing data to ensure a better model or analysis. A programmatic way to cleanse data enables new data to be automatically cleaned as it’s collected. (Figure 3).

Figure 3: The two types of machine learning methods provide different algorithms tailored for different problems. Source: The MathWorks, Inc.

Figure 3: The two types of machine learning methods provide different algorithms tailored for different problems.
Source: The MathWorks, Inc.

Data reduction techniques such as principal component analysis (PCA) help to find the most influential of your data inputs. By reducing the number of inputs, a more compact model can be created, which requires less processing when the model is embedded into the products or services.

Data processing at scale enables engineers and scientists to not only work with large sets of data on a desktop workstation, but use their analysis pipeline or algorithms on an enterprise class system such as Hadoop. The ability to move between systems without changing code greatly increases efficiency.

Creating a Model

Imagine collecting years’ worth of data. What is valuable in this data? Often, in order to analyze the data and create an intelligent and predictive model, machine learning is required.

Machine learning uses computational methods to “learn” information directly from data without relying on a predetermined equation as a model. It turns out this ability to train models using the data itself opens up many use cases for predictive modeling such as predictive health for complex machinery and systems, physical and natural behaviors, energy load forecasting and financial credit scoring.

Machine learning is broadly divided into two types of methods, supervised and unsupervised learning, each of which contains several algorithms tailored for different problems.

  • Supervised learning uses a training data set which maps input data to previously known response values.
  • Unsupervised learning draws inferences from data sets with input data that does not map to a known output response.

Incorporating Big Data for Real-World Solutions

There are a number of platforms available to IT organizations for storing and processing of big data that fall into two categories: 1) batch processing of large, historical sets of data, and 2) real-time or near real-time processing of data that is continuously collected from devices

Figure 4: Integrating models with MATLAB. Source: The MathWorks, Inc.

Figure 4: Integrating models with MATLAB.
Source: The MathWorks, Inc.

Batch applications, such as Spark or MapReduce, are commonly used to analyze and process historical data that has been collected over long periods of time or across many different devices or systems. These applications are typically used to look for trends in data and develop predictive models.

Streaming applications that process in real- or near-real time, such as Kafka, may be coupled with a predictive model to add more intelligence and adaptive capabilities to a product or service such as predictive maintenance, optimizing equipment fleets, and monitoring manufacturing lines.

Incorporating models into products or services is typically done in conjunction with enterprise application developers and system architects, but this can create a challenge. Developing models in traditional programming languages is difficult for engineers and scientists, while recoding models can be time-consuming and error prone, especially if the models require periodic updates.

To alleviate this issue, enterprise application developers should look for data analysis and modeling tools that are familiar to their engineers and scientists, while also providing production ready tooling such as application servers and code generation for deploying models into their applications, products and services.

To truly take advantage of the value of big data, the full process – from sourcing data to developing analytical models to deploying these models into production – must be supported. IT managers and solution architects can use modeling tools to enable the scientists and engineers in their organizations to develop algorithms and models for smarter and differentiated products and services. Simultaneously, the organization is being enabled to rapidly incorporate these models into its products and services by leveraging production-ready application servers and code generation capabilities that are found in these tools.

The combination of a knowledgeable domain expert who has been enabled to be an effective data scientist, along with an IT team capable of rapidly incorporating their work into the services, products and operations of their organization, makes for a significant competitive advantage when offering the products and services that customers are demanding.

Dave Oswill is product marketing manager at MathWorks, developers of MATLAB.

Analytics data science news articles



Related Posts

  • 34
    FEATURES Welcome to ‘worksocial’ world By Samir Gulati New approach, technology blends data, process and collaboration for better, faster decision-making. How to pick a business partner By David Zakkam and Deepinder Singh Dhingra Ten things to consider when evaluating analytics and decision sciences partners. Big data, analytics and elections By…
    Tags: analytics, data, analysis, business, better, things, modeling, big
  • 34
    FEATURES Fulfilling the promise of analytics By Chris Mazzei Strategy, leadership and consumption: The keys to getting the most from big data and analytics focus on the human element. How to get the most out of data lakes By Sean Martin A handful of requisite business skills that facilitate self-service…
    Tags: analytics, data, big, business, things, internet, iot, decisions
  • 34
    A quick quiz: What is a good nine- or 10-letter description of the emerging interest in business analytics and big data that ends in “-al”? A choice that may come to mind for many is “hysterical.” This choice reflects frenzied excitement about opportunities for business analytics to solve problems often…
    Tags: analytics, data, business, intelligence, big
  • 31
    Ducati Corse, the racing department of Ducati Motor Holding, a world leader in sports motorcycle manufacturing, is working with Accenture to integrate the Internet of Things (IoT) and artificial intelligence technologies into the testing of its MotoGP racing bikes. Ducati Corse wants to make testing its race bikes faster, cheaper…
    Tags: data, iot, things, internet, intelligence, analytics
  • 30
    Organizations of all sizes and types are awash in data possibilities, yet most of them cannot capitalize on the potential for a variety of reasons. The good news, however, is that with the right decisions and focus, these possibilities can turn quickly into realized opportunities.
    Tags: data, analytics, business, big

Analytics Blog

Electoral College put to the math test

With the campaign two months behind us and the inauguration of Donald Trump two days away, isn’t it time to put the 2016 U.S. presidential election to bed and focus on issues that have yet to be decided? Of course not.


Holiday Retirement earns Edelman Award

Holiday Retirement won the 2017 Franz Edelman Award for Achievement in Operations Research and the Management Sciences from INFORMS for its use of operations research (O.R.) to improve the pricing model for its more than 300 senior living communities across the United States. Read more →

U.S. Air Force and Disney receive the 2017 INFORMS Prize

The U.S. Air Force and The Walt Disney Company both received the 2017 INFORMS Prize for their pioneering and enduring integration of operations research (O.R.) and analytics programs into their organizations. The prizes were presented at the INFORMS Conference on Business Analytics & Operations Research in April in Las Vegas. Read more →

Air Force Academy’s O.R. program saluted

The U.S. Air Force Academy won the 2017 UPS George D. Smith Prize for its operations research (O.R.) program, which prepares graduates to become frontline O.R. practitioners as analysts in the Air Force. The program exposes more than 50 percent of cadets to at least one O.R. course and provides cadets the opportunity to graduate with a Bachelor of Science degree in O.R. Read more →




2017 INFORMS Healthcare Conference
July 26-28, 2017, Rotterdam, the Netherlands


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to