Share with your friends


Analytics Magazine

Data Analysis & Modeling Tools: Leveraging big data for better business decisions

Dave Oswill, MathWorksBy Dave Oswill

Working with big data is fast becoming a key step in the process of scientific discovery and engineering. This is happening as technologies such as smart sensors and the Internet of Things (IoT) are enabling vast amounts of detailed data to be collected from scientific instruments, manufacturing systems, connected cars, aircraft and other systems.

Significant value lies in this data as it may show important physical phenomena or provide information on the operating environment, efficiency and health of a system. With the proper tools and techniques, this data can be used for rapid scientific discoveries and to incorporate more intelligence into products, services and manufacturing processes.

For many engineers and scientists, who must consider implementing these data-driven solutions into their enterprise, this can be a daunting process due to the systems commonly used to store, manage and process this valuable data. Software analysis and data modeling tools have been enhanced with new capabilities that allow engineers and scientists to use familiar syntax and functions to unlock the complexity of the data they are collecting to make more effective design and business decisions.

Leveraging big data

The first challenge in working with big data is gaining access to large data sets that may be stored in various types of systems.
Photo Courtesy of | illustrator

Accessing Large Sets of Data

Big data, Internet of Things (IoT), scientific discovery, better business decisions, large data sets, data analysis, modeling tools

Figure 1: Access a wide range of big data.
Source: The MathWorks, Inc.

The first challenge in working with big data is gaining access to these large data sets that may be stored in various types of systems ranging from shared file systems, databases (SQL/NoSQL), IoT data aggregators and data historians, to distributed platforms such as Hadoop. The data may consist of delimited text, spreadsheets, images, videos and other proprietary formats.

To effectively work with this data, engineers and scientists need scalable tools, such as MATLAB, that can provide access to a variety of systems and data formats. This is especially crucial in cases where more than one type of big data platform or data format may be in use.

Exploring and Processing Large Sets of Data

After accessing the data and before creating a model or theory, it’s important to understand what is in the data, as it may have a major impact on the final result. There are certain capabilities that simplify this exploration process, making it easier for engineers and scientists to observe, clean and effectively work with big data, including:

Summary visualizations, such as binScatterPlot (Figure 2) provide a way to easily view patterns and quickly gain insights.

big data visualization for business analytics

Figure 2: binScatterPlot in MATLAB.
Source: The MathWorks, Inc.

Data cleansing removes outliers and replaces bad or missing data to ensure a better model or analysis. A programmatic way to cleanse data enables new data to be automatically cleaned as it’s collected. (Figure 3).

Figure 3: The two types of machine learning methods provide different algorithms tailored for different problems. Source: The MathWorks, Inc.

Figure 3: The two types of machine learning methods provide different algorithms tailored for different problems.
Source: The MathWorks, Inc.

Data reduction techniques such as principal component analysis (PCA) help to find the most influential of your data inputs. By reducing the number of inputs, a more compact model can be created, which requires less processing when the model is embedded into the products or services.

Data processing at scale enables engineers and scientists to not only work with large sets of data on a desktop workstation, but use their analysis pipeline or algorithms on an enterprise class system such as Hadoop. The ability to move between systems without changing code greatly increases efficiency.

Creating a Model

Imagine collecting years’ worth of data. What is valuable in this data? Often, in order to analyze the data and create an intelligent and predictive model, machine learning is required.

Machine learning uses computational methods to “learn” information directly from data without relying on a predetermined equation as a model. It turns out this ability to train models using the data itself opens up many use cases for predictive modeling such as predictive health for complex machinery and systems, physical and natural behaviors, energy load forecasting and financial credit scoring.

Machine learning is broadly divided into two types of methods, supervised and unsupervised learning, each of which contains several algorithms tailored for different problems.

  • Supervised learning uses a training data set which maps input data to previously known response values.
  • Unsupervised learning draws inferences from data sets with input data that does not map to a known output response.

Incorporating Big Data for Real-World Solutions

There are a number of platforms available to IT organizations for storing and processing of big data that fall into two categories: 1) batch processing of large, historical sets of data, and 2) real-time or near real-time processing of data that is continuously collected from devices

Figure 4: Integrating models with MATLAB. Source: The MathWorks, Inc.

Figure 4: Integrating models with MATLAB.
Source: The MathWorks, Inc.

Batch applications, such as Spark or MapReduce, are commonly used to analyze and process historical data that has been collected over long periods of time or across many different devices or systems. These applications are typically used to look for trends in data and develop predictive models.

Streaming applications that process in real- or near-real time, such as Kafka, may be coupled with a predictive model to add more intelligence and adaptive capabilities to a product or service such as predictive maintenance, optimizing equipment fleets, and monitoring manufacturing lines.

Incorporating models into products or services is typically done in conjunction with enterprise application developers and system architects, but this can create a challenge. Developing models in traditional programming languages is difficult for engineers and scientists, while recoding models can be time-consuming and error prone, especially if the models require periodic updates.

To alleviate this issue, enterprise application developers should look for data analysis and modeling tools that are familiar to their engineers and scientists, while also providing production ready tooling such as application servers and code generation for deploying models into their applications, products and services.

To truly take advantage of the value of big data, the full process – from sourcing data to developing analytical models to deploying these models into production – must be supported. IT managers and solution architects can use modeling tools to enable the scientists and engineers in their organizations to develop algorithms and models for smarter and differentiated products and services. Simultaneously, the organization is being enabled to rapidly incorporate these models into its products and services by leveraging production-ready application servers and code generation capabilities that are found in these tools.

The combination of a knowledgeable domain expert who has been enabled to be an effective data scientist, along with an IT team capable of rapidly incorporating their work into the services, products and operations of their organization, makes for a significant competitive advantage when offering the products and services that customers are demanding.

Dave Oswill is product marketing manager at MathWorks, developers of MATLAB.

Analytics data science news articles



Related Posts

  • 34
    FEATURES Welcome to ‘worksocial’ world By Samir Gulati New approach, technology blends data, process and collaboration for better, faster decision-making. How to pick a business partner By David Zakkam and Deepinder Singh Dhingra Ten things to consider when evaluating analytics and decision sciences partners. Big data, analytics and elections By…
    Tags: analytics, data, analysis, business, better, things, modeling, big
  • 34
    A quick quiz: What is a good nine- or 10-letter description of the emerging interest in business analytics and big data that ends in “-al”? A choice that may come to mind for many is “hysterical.” This choice reflects frenzied excitement about opportunities for business analytics to solve problems often…
    Tags: analytics, data, business, intelligence, big
  • 34
    FEATURES Fulfilling the promise of analytics By Chris Mazzei Strategy, leadership and consumption: The keys to getting the most from big data and analytics focus on the human element. How to get the most out of data lakes By Sean Martin A handful of requisite business skills that facilitate self-service…
    Tags: analytics, data, big, business, things, internet, iot, decisions
  • 33
    Car dashboards are simple visual indicators of a complex machine with many parts that performs a high-stakes task in a context of many overlapping, conflicting rules and goals: personal convenience, safety, minimum travel time, courtesy to other drivers and so on. The speedometer, perhaps one of the most important indicators,…
    Tags: business, data, analytics, intelligence, big
  • 31
    Ducati Corse, the racing department of Ducati Motor Holding, a world leader in sports motorcycle manufacturing, is working with Accenture to integrate the Internet of Things (IoT) and artificial intelligence technologies into the testing of its MotoGP racing bikes. Ducati Corse wants to make testing its race bikes faster, cheaper…
    Tags: data, iot, things, internet, intelligence, analytics


Closer C-level collaboration needed to bridge gap in cyber readiness

With the proliferation of more and more sensitive data, expanding connectivity, and the adoption of automated processes, new research from Accenture reveals that C-suite and IT decision-makers need to embrace a different approach to cybersecurity to effectively protect against future cyber risks. While most companies have a chief information security officer (CISO) or assigned cybersecurity to a C-suite executive, such as a chief information officer (CIO), often, these leaders have limited influence on cybersecurity strategy outside their departments. Read more →

Report: Automotive industry to invest $3.3 billion in big data in 2018

Big data investments in the automotive industry are expected to surpass $3.3 billion in 2018, according to a report by SNS Telecom & IT. Amid the proliferation of real-time and historical data from sources such as connected devices, web, social media, sensors, log files and transactional applications, big data is rapidly gaining traction from a diverse range of vertical sectors. Read more →



INFORMS Annual Meeting
Nov. 4-7, 2018, Phoenix

Winter Simulation Conference
Dec. 9-12, 2018, Gothenburg, Sweden


Applied AI & Machine Learning | Comprehensive
Sept. 10-13, 17-20 and 24-25

Advancing the Analytics-Driven Organization
Sept. 17-20, 12-5 p.m. LIVE Online

The Analytics Clinic: Ensemble Models: Worth the Gains?
Sept. 20, 11 a.m.-12:30 p.m.

Predictive Analytics: Failure to Launch Webinar
Oct. 3, 11 a.m.

Advancing the Analytics-Driven Organization
Oct. 1-4, 12 p.m.-5 p.m.

Applied AI & Machine Learning | Comprehensive
Oct. 15-19, Washington, D.C.

Making Data Science Pay
Oct. 29 -30, 12 p.m.-5 p.m.


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to