Software Survey: Forecasting 2016
Corporations are seeking not only to generate forecasts, but also to integrate those forecasts into planning, optimization and reporting systems.
Photo Courtesy of 123rf.com | seewhatmitchsee
New tools, new capabilities, new trends: Survey of 26 software packages from 19 vendors.
By Chris Fry (left) and Vijay Mehrotra
Welcome to the biennial forecasting software survey, where we take stock of the latest technologies and trends in forecasting affecting both vendors and users. The data used to develop this survey includes responses from 19 vendors for 26 software packages that span a range of capabilities and price points, with the results from this survey summarized below.
As part of the rapid proliferation of business analytics, forecasting models are an increasingly important part of the management landscape. More and more managers and executives rely on sophisticated forecasting methods, not only for planning purposes but also as a foundation for performance analysis, process improvement and optimization.
Open source software continues to be a major factor driving the growth of analytics, offering a unique combination of flexibility, power and low cost. In particular, many leading firms are using tools like R and Python to develop solutions for forecasting and predictive analytics that are customized for their business problems, tightly coupled with their data architectures, and integrated directly into other existing systems.
Given all of this, the market for forecasting software vendors is dynamic, featuring many different types of innovations. Here are some of the major trends that we are seeing in the marketplace.
Top Forecasting Trends for 2016
No. 1: integration. Corporations are seeking not only to generate forecasts, but also to integrate those forecasts into planning, optimization and reporting systems. As one survey respondent, ForecastPro, stated, “More and more business users are calling for comprehensive forecasting solutions that, in addition to simply generating statistical forecasts, can be used as the backbone of ongoing corporate processes such as sales and operations planning (S&OP), demand planning and supply chain optimization.” Many vendors appear to be addressing this demand.
No. 2: automation. Forecasters are seeking features such as automated model selection, automated alerts, and automated graphics and reporting. As data velocity and complexity are growing, there appears to be an increasing willingness to entrust model design to the software, especially when many forecasts are being generated simultaneously. In response, nearly all of the vendors surveyed report having some type of automatic or semi-automatic forecasting capabilities.
No. 3: visualization. High-quality visualizations are fast becoming part of the “table stakes” for forecasting software packages. In addition to standard statistical output, many of today’s tools offer a range of visuals, including such features as box plots, normal probability plots, histograms, ANOVA Pareto charts, decomposition charts and automated statistically generated range forecast plots.
No. 4: virtualization. Several vendors have begun to move their offerings to the cloud, offering virtual hosted forecasting tools. Statgraphics, XLMiner.com, Roadmap GPS and PEERForecaster/PEERPlanner all mentioned their cloud offerings specifically in their survey responses. As cloud computing continues to grow, we foresee this trend continuing.
No. 5: forecast quality measurement. Forecasters are continually pressured to improve the quality of their forecasts, or defend why their forecasts are as good as they can get. Vendors are addressing this need with such solutions as automated ANOVA analysis, automated naïve forecast generation and automated forecast value added analysis.
No. 6: capabilities enhancement. Forecasting software vendors are giving increased attention to “hard” forecasting problems such as new product forecasting and forecasting of intermittent demand, while also providing ways to integrate additional machine-learning techniques into their forecasting suites. Some vendors are offering the ability to create automated ensemble forecast models, built by combining multiple forecasts generated using different techniques in order to improve overall forecast accuracy.
Case Study: Forecasting Rainfall in California
To illustrate some of these trends, we attempted to forecast monthly rainfall levels in our home state of California, which has been suffering from a drought for the past few years.
We utilized publicly available data from the National Oceanic and Atmospheric Administration (NOAA) , collecting monthly precipitation data (in inches) from January 2006 through March 2016 for 194 observation stations in California. From this, we used the first 10 years of data as a training set and then attempted to forecast the monthly rainfall at each station for the final three months (i.e., January-March 2016). We also included one exogenous variable, NOAA’s El Niño Index  (the El Niño climate oscillation is a well-known factor in predicting California rainfall, as changing temperatures in the Pacific Ocean affect rain patterns on the West Coast).
Figure 1 shows the rainfall history for a sample collection station, along with the El Niño Index history. The rainfall follows a somewhat seasonal pattern at this station, and appears to show some (slightly lagged) relationship with the El Niño index. Note the high El Niño activity toward the end of 2015 and corresponding heavy rainfall level in January 2016.
Figure 1: Monthly rainfall for a sample collection station in California, along with the corresponding El Niño indices.
We selected IBM’s SPSS Statistics forecasting package to conduct our analysis. We had not used SPSS previously, so we felt that this choice allowed us to work without any preconceived expectations on how the software would perform, and would also enable us to convey a realistic experience in terms of the learning curve required to use the product. IBM offers a generous, fully functional two-week free trial for the SPSS product, which made it easy for us to download and start working with the product right away. The package includes a fully automatic forecasting module, called the Expert Modeler, which optimizes model and parameter selection across a suite of exponential smoothing and ARIMA models.
After collecting and prepping the data, we utilized the Expert Modeler to generate an initial set of forecasts. SPSS created a separate independent model for each weather station, with total run time less than one minute. SPSS modeled the majority of stations (177 out of 194) using simple seasonal exponential smoothing. For the remainder of the stations, it used a combination of ARIMA, seasonal ARIMA and ARIMA with transfer function models (to incorporate the El Niño index). The model structures for the remaining models varied widely, from simple “flat line” models (when the software could not find a suitable pattern) to complex multi-term ARIMA models such as ARIMA (0,0,3) (1,1,0) with 13-month delayed seasonally differenced external regressor effect or ARIMA (0,0,11) (0,0,0). In addition to the forecast values, the software also produced goodness of fit statistics for individual and aggregate models, along with parameter estimates and plots.
Visually, the fitted time series looked reasonable when compared to the actuals. Figure 2 shows an example fit, along with a three-month projection, for a sample collection station.
Figure 2: Plot showing fitted time series and forecasted rainfall compared to actual rainfall for a sample collection station.
We then spent additional time to review and refine the models. We noted that the software’s initial fitted model output included negative rainfall estimates in some months, so we adjusted these to zero in our analysis. We also rejected some of the forecasts in favor of models that we felt were either more intuitive or would better capture the seasonal behavior and El Niño effect. Shown in Figure 3 are two “before and after” examples in which we replaced an automatically generated model with a simple seasonal ARIMA model including seasonal and non-seasonal transfer effects for the external regressor. The new models seemed to fit the seasonal peaks better than the smoothing models did, and also projected higher 2016 rainfall in response to the high El Niño Index. Figure 3 shows example model fit and forecast plots for two stations, comparing the SPSS Expert Forecast output with that of alternate models that we selected instead.
Figure 3: Sample model fit and forecast plots comparing SPSS’ automatic forecast output with potential alternate model structures. For the first example (San Jose), the auto-generated model predicted 6.97 inches of rainfall for Q1 2016, while the alternate model predicted 7.86 inches. Actual rainfall was 8.36 inches. For the second example, projected Q1 2016 rainfall was 5.61 inches for the automatic model and 7.39 inches for the alternate model. Actual rainfall was 7.22 inches.
In all, we felt that despite our limited data set, the automated procedure gave us a very quick first-pass at the analysis that seemed quite reasonable, and may likely be satisfactory in many contexts. The experience reconfirmed to us the power of modern automatic forecasting tools, and also reminded us of the value that comes from coupling that power with our time and attention as analysts to continually seek out improvements in our models.
Happy forecasting to you in 2016!
About the Survey
For this year’s forecasting software survey, as in the past, we attempted to include as many forecasting products as possible. We contacted all prior survey participants, as well as any new vendors that we were able to identify. We asked each respondent to complete an online questionnaire covering a comprehensive list of questions spanning features and capabilities, recent enhancements, licensing and fees, technical support and other areas. We followed up with each vendor to help ensure that we obtained as many responses as possible. Vendors not included in this issue are invited to submit a completed online questionnaire (http://www.lionhrtpub.com/ancill/fssurvey.shtml), and their product will be added to the online version of the forecasting survey.
The purpose of the survey is simply to inform the reader of what is available. The information in the survey comes directly from the vendors, and no attempt was made to verify or validate the information they gave us.
Automation levels: Forecasting software varies when it comes to the degree to which the software can find the appropriate model and the optimal parameters of that model. For example, Winters’ method requires values for three smoothing constants, and Box-Jenkins models have to be specified with various parameters, such as ARIMA (1,0,1) (0,1,2). For the purposes of this and previous surveys, the ability of the software to find the optimal model and parameters for the data is characterized as follows:
- Automatic forecasting software: Software is labeled as automatic if it recommends both the appropriate model to use on a particular data set and finds the optimal parameters for that model. Automatic software typically searches through multiple potential models to minimize a specific fit metric, such as Akaike Information Criterion (AIC), Normailzed Bayesian Information Criterion (BIC) or RMSE; it then recommends a forecast model for the data, gives the model’s optimal parameters, calculates forecasts for a user-specified number of future periods, and gives various summary statistics and graphs.
- Semi-automatic forecasting software: The second automation level is called semi-automatic. Such software asks the user to pick a forecasting model from a menu and some statistic to minimize, and the program then finds the optimal parameters for that model, the forecasts, and various graphs and statistics.
- Manual forecasting software: We refer to the third level of automation as manual. Here the user must specify both the model that should be used and the corresponding parameters. The software then finds the forecasts, summary statistics and charts.
Note that some products fall into more than one category. For example, if you choose a Box-Jenkins model, the software may find the optimal parameters for that model, but if you specify that Winters’ method be used, the product may require that you manually enter the three smoothing constants. Of the software tools included in the survey, 23 (88 percent) offer semi-automatic forecasting, and 18 (69 percent) offer automatic forecasting capabilities.
Chris Fry (email@example.com) is the founder and managing director of Strategic Management Solutions, an analytics consulting and services firm. Vijay Mehrotra (firstname.lastname@example.org) is a professor of business analytics and information systems at the University of San Francisco. Both authors are members of INFORMS. The authors thank Gavin Leeper and Craig Volonoski for their contributions to the case study research.
Editor’s note: A version of this article appeared in the June 2016 issue of OR/MS Today.
- This data was compiled from requests from NOAA’s database at https://www.ncdc.noaa.gov/cdo-web/search.
- The El Niño Index tracks temperature changes in the Pacific Ocean. The data we used can be downloaded at https://catalog.data.gov/dataset/climate-prediction-center-cpcoceanic-nino-index.
Survey Directory & Data