Share with your friends


Analytics Magazine

Predictive Modeling: ‘Ensemble of Ensemble’

November/December 2015

business analytics news and articles

Meet the new nominal in predictive modeling

Anindya SenguptaBy Anindya Sengupta

Accurate prediction of the future is the most important motto of predictive modeling. With the adoption of predictive modeling and analytics across different industries, the focus on accuracy has increased massively. Predictive modelers across the globe have been in search of newer and more innovative techniques to increase their prediction accuracy. Against this backdrop, one method to consider for improving the accuracy of model predictions is the ensemble approach of modeling.

Ensemble models combine two or more models to enable a more robust prediction, classification or variable selection. The most common form of ensemble is combining weak decision trees into one strong model. The most prevalent methods in this regard are bagging and boosting. These machine-learning techniques have better accuracy than the erstwhile traditional statistical techniques such as least square estimation and maximum likelihood. However, the traditional techniques are more robust. We should ideally have a blended approach wherein we integrate the best parts of the two models. The new buzzword phrase in the predictive analytics world is “ensemble of ensemble” models.

There are different ways of combining different models. The most simplistic way of combining models is to take a simple or weighted average of the predictions from different models. For business problems with binary target variables, such as predicting the likelihood of fraud, this approach sometimes gives better results in terms of predicting the event rate. The major limitation to this approach is that the overall accuracy of this combined model lies between the accuracies of the individual models. Thus, this method is definitely not advisable for modeling continuous variables as overall accuracy matters most there. This method, however, can still be considered for modeling binary target variables.

Best Practices of Modeling

Given the strength of the traditional models, one possible approach of combining the models can be to use the traditional models for variable selection. Then, with the chosen set of variables, we should run the machine-learning method incorporating bagging and boosting. This will ensure that all best practices of modeling are maintained. One argument against using the machine learning techniques is that the variable selection process is not well defined. Two highly correlated variables can be selected in the model, and business may not have much insight separately from these two variables as both of them may be suggesting similar insight. One of the best ways to solve this kind of problem is to use traditional models for variable selection and use those variables for machine learning models. This can be one kind of ensemble.

One of the more prevalent forms of combining two different models has been to use the output of one model as input in the other model. This happens in two-stage models. The entire literature on censor modeling is based on these two-stage models. One can use the inverse mills ratio calculated from the output of the first-stage model as an input to the second-stage model. The method of using output from the model in stage one as an input in the model in stage two can also be treated as some kind of ensemble modeling.

One of the most efficient methods of creating ensemble of ensemble models is to use the predictions of different models to develop a separate model on the target variable using the predictions as the predictors. In this way, we will get the final predictions as a function of the predictors. The model will determine the functional form. In this method, we are basically optimizing the predictions using the individual predictors. This method gives better results in terms of the overall accuracy of the model. The combined model generally has higher accuracy than the individual models.

More Accurate Results

Whatever way we use to combine different models, it has been generally observed that “ensemble of ensemble” models give more accurate results than individual models. This has been the new normal in the predictive modeling world.

Going forward, ample research is needed on exploring more innovative ways of creating ensembles of traditional and the machine-learning models. Given the focus on prediction accuracy, researchers need to focus on more scientific ways of combining models in order to ensure maximum accuracy.

Anindya Sengupta is an associate director at Fractal Analytics (

business analytics news and articles


Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a $320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →



INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to