Share with your friends










Submit

Analytics Magazine

Predictive Modeling: ‘Ensemble of Ensemble’

November/December 2015

business analytics news and articles

Meet the new nominal in predictive modeling

Anindya SenguptaBy Anindya Sengupta

Accurate prediction of the future is the most important motto of predictive modeling. With the adoption of predictive modeling and analytics across different industries, the focus on accuracy has increased massively. Predictive modelers across the globe have been in search of newer and more innovative techniques to increase their prediction accuracy. Against this backdrop, one method to consider for improving the accuracy of model predictions is the ensemble approach of modeling.

Ensemble models combine two or more models to enable a more robust prediction, classification or variable selection. The most common form of ensemble is combining weak decision trees into one strong model. The most prevalent methods in this regard are bagging and boosting. These machine-learning techniques have better accuracy than the erstwhile traditional statistical techniques such as least square estimation and maximum likelihood. However, the traditional techniques are more robust. We should ideally have a blended approach wherein we integrate the best parts of the two models. The new buzzword phrase in the predictive analytics world is “ensemble of ensemble” models.

There are different ways of combining different models. The most simplistic way of combining models is to take a simple or weighted average of the predictions from different models. For business problems with binary target variables, such as predicting the likelihood of fraud, this approach sometimes gives better results in terms of predicting the event rate. The major limitation to this approach is that the overall accuracy of this combined model lies between the accuracies of the individual models. Thus, this method is definitely not advisable for modeling continuous variables as overall accuracy matters most there. This method, however, can still be considered for modeling binary target variables.

Best Practices of Modeling

Given the strength of the traditional models, one possible approach of combining the models can be to use the traditional models for variable selection. Then, with the chosen set of variables, we should run the machine-learning method incorporating bagging and boosting. This will ensure that all best practices of modeling are maintained. One argument against using the machine learning techniques is that the variable selection process is not well defined. Two highly correlated variables can be selected in the model, and business may not have much insight separately from these two variables as both of them may be suggesting similar insight. One of the best ways to solve this kind of problem is to use traditional models for variable selection and use those variables for machine learning models. This can be one kind of ensemble.

One of the more prevalent forms of combining two different models has been to use the output of one model as input in the other model. This happens in two-stage models. The entire literature on censor modeling is based on these two-stage models. One can use the inverse mills ratio calculated from the output of the first-stage model as an input to the second-stage model. The method of using output from the model in stage one as an input in the model in stage two can also be treated as some kind of ensemble modeling.

One of the most efficient methods of creating ensemble of ensemble models is to use the predictions of different models to develop a separate model on the target variable using the predictions as the predictors. In this way, we will get the final predictions as a function of the predictors. The model will determine the functional form. In this method, we are basically optimizing the predictions using the individual predictors. This method gives better results in terms of the overall accuracy of the model. The combined model generally has higher accuracy than the individual models.

More Accurate Results

Whatever way we use to combine different models, it has been generally observed that “ensemble of ensemble” models give more accurate results than individual models. This has been the new normal in the predictive modeling world.

Going forward, ample research is needed on exploring more innovative ways of creating ensembles of traditional and the machine-learning models. Given the focus on prediction accuracy, researchers need to focus on more scientific ways of combining models in order to ensure maximum accuracy.


Anindya Sengupta is an associate director at Fractal Analytics (www.FractalAnalytics.com).

business analytics news and articles

Analytics Blog

Electoral College put to the math test


With the campaign two months behind us and the inauguration of Donald Trump two days away, isn’t it time to put the 2016 U.S. presidential election to bed and focus on issues that have yet to be decided? Of course not.


Headlines

Three keys for organizations to gain value from information

In the current information-driven society and increasingly digitalized world, Gartner, Inc. says that sentiments are shifting from the economics of tangible assets to the economics of information – “infonomics” – and other intangible assets. Infonomics is the theory, study and discipline of asserting economic significance to information. It strives to apply both economic and asset management principles and practices to the valuation, handling and deployment of information assets.  Read more →

Burtch Works study on ‘Salaries of Predictive Analytics Professionals’

According to the recently released Burtch Works study on “Salaries of Predictive Analytics Professionals 2017,” senior-level executives saw the largest increase in salaries from 2016 to 2017, and industry diversification of employment has diluted the concentration of such professionals from financial services and marketing/advertising to consulting and technology. Read more →

New study asks, ‘Is your business AI-ready?’

Despite fears that robots will replace human labor, the majority of artificial intelligence (AI) leaders (79 percent) expect their employees will work comfortably with robots by 2020, according to a new Genpact survey of C-Suite and senior executives titled, “Is Your Business AI-Ready?” Read more →

UPCOMING ANALYTICS EVENTS

INFORMS-SPONSORED EVENTS

2017 Winter Simulation Conference (WSC 2017)
Dec. 3-6, 2017, Las Vegas

CAP® EXAM SCHEDULE

CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:


 
For more information, go to 
https://www.certifiedanalytics.org.