Share with your friends










Submit

Analytics Magazine

Predictive Modeling: ‘Ensemble of Ensemble’

November/December 2015

business analytics news and articles

Meet the new nominal in predictive modeling

Anindya SenguptaBy Anindya Sengupta

Accurate prediction of the future is the most important motto of predictive modeling. With the adoption of predictive modeling and analytics across different industries, the focus on accuracy has increased massively. Predictive modelers across the globe have been in search of newer and more innovative techniques to increase their prediction accuracy. Against this backdrop, one method to consider for improving the accuracy of model predictions is the ensemble approach of modeling.

Ensemble models combine two or more models to enable a more robust prediction, classification or variable selection. The most common form of ensemble is combining weak decision trees into one strong model. The most prevalent methods in this regard are bagging and boosting. These machine-learning techniques have better accuracy than the erstwhile traditional statistical techniques such as least square estimation and maximum likelihood. However, the traditional techniques are more robust. We should ideally have a blended approach wherein we integrate the best parts of the two models. The new buzzword phrase in the predictive analytics world is “ensemble of ensemble” models.

There are different ways of combining different models. The most simplistic way of combining models is to take a simple or weighted average of the predictions from different models. For business problems with binary target variables, such as predicting the likelihood of fraud, this approach sometimes gives better results in terms of predicting the event rate. The major limitation to this approach is that the overall accuracy of this combined model lies between the accuracies of the individual models. Thus, this method is definitely not advisable for modeling continuous variables as overall accuracy matters most there. This method, however, can still be considered for modeling binary target variables.

Best Practices of Modeling

Given the strength of the traditional models, one possible approach of combining the models can be to use the traditional models for variable selection. Then, with the chosen set of variables, we should run the machine-learning method incorporating bagging and boosting. This will ensure that all best practices of modeling are maintained. One argument against using the machine learning techniques is that the variable selection process is not well defined. Two highly correlated variables can be selected in the model, and business may not have much insight separately from these two variables as both of them may be suggesting similar insight. One of the best ways to solve this kind of problem is to use traditional models for variable selection and use those variables for machine learning models. This can be one kind of ensemble.

One of the more prevalent forms of combining two different models has been to use the output of one model as input in the other model. This happens in two-stage models. The entire literature on censor modeling is based on these two-stage models. One can use the inverse mills ratio calculated from the output of the first-stage model as an input to the second-stage model. The method of using output from the model in stage one as an input in the model in stage two can also be treated as some kind of ensemble modeling.

One of the most efficient methods of creating ensemble of ensemble models is to use the predictions of different models to develop a separate model on the target variable using the predictions as the predictors. In this way, we will get the final predictions as a function of the predictors. The model will determine the functional form. In this method, we are basically optimizing the predictions using the individual predictors. This method gives better results in terms of the overall accuracy of the model. The combined model generally has higher accuracy than the individual models.

More Accurate Results

Whatever way we use to combine different models, it has been generally observed that “ensemble of ensemble” models give more accurate results than individual models. This has been the new normal in the predictive modeling world.

Going forward, ample research is needed on exploring more innovative ways of creating ensembles of traditional and the machine-learning models. Given the focus on prediction accuracy, researchers need to focus on more scientific ways of combining models in order to ensure maximum accuracy.


Anindya Sengupta is an associate director at Fractal Analytics (www.FractalAnalytics.com).

business analytics news and articles

Analytics Blog

Electoral College put to the math test


With the campaign two months behind us and the inauguration of Donald Trump two days away, isn’t it time to put the 2016 U.S. presidential election to bed and focus on issues that have yet to be decided? Of course not.

Headlines

Stereotypes hold back girls’ interest in STEM subjects

New research from Accenture reveals that young people in the United Kingdom and Ireland are most likely to associate a career in science and technology with “doing research” (52 percent), “working in a laboratory” (47 percent) and “wearing a white coat” (33 percent). The study found that girls are more likely to make these stereotypical associations than boys. Read more →

Gartner: Connected ‘things’ will jump 31 percent in 2017

Gartner, Inc. forecasts that 8.4 billion connected things will be in use worldwide in 2017, up 31 percent from 2016, and will reach 20.4 billion by 2020. Total spending on endpoints and services will reach almost $2 trillion in 2017. Regionally, China, North America and Western Europe are driving the use of connected things, and the three regions together will represent 67 percent of the overall Internet of Things (IoT) installed base in 2017. Read more →

U.S. News: Analytics jobs rank among the best

When it comes to the best business jobs, analytics- and operations research-oriented disciplines dominate the list, according to U.S. News & World Report’s rankings of the “2017 Best Jobs.” In order, the top five “best business jobs” listings include: 1. statistician
, 2. mathematician
, 3. financial advisor, 
4. actuary, and 
5. operations research analyst. Read more →

UPCOMING ANALYTICS EVENTS

INFORMS-SPONSORED EVENTS

CONFERENCES

2017 INFORMS Business Analytics Conference
April 2-4, 2017, Las Vegas

2017 INFORMS Healthcare Conference
July 26-28, 2017, Rotterdam, the Netherlands

CAP® EXAM SCHEDULE

CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:


 
For more information, go to 
https://www.certifiedanalytics.org.