Software Survey: The future of forecasting
Making predictions from hard and fast data.
By Jack Yurkiewicz
Here is an easy forecast to make: Forecasting will be part of our information flow for the foreseeable future. Forecasting is also a key topic in my
“Decision Modeling for Management” course. In preparing the midterm exam for this past spring term, I wanted the students to analyze the enrollment figures for the Affordable Care Act and make some forecasts. The media has been talking about these enrollment figures since the rollout, and politicians have been making projections about them as well. In the course we covered various forecasting methodologies, including trend analysis. Thus, my plan for a midterm problem was to give the students the enrollment data and have them make a forecast for the May 1 enrollment deadline. Getting those enrollment numbers became obstacle number one.
Figures 1 and 2 show some typical results of an Internet search. I found graphs, some better, more worse (look at the markers on the x-axis of the graph in Figure 1), lots of opinion articles with forecasts, but no data. I punted and decided to present the class a similar but far less-pressing problem. On March 31, the day of the midterm exam, I asked students to make forecasts for the cumulative domestic box-office gross for the recently released movie “Non-Stop.”
The action film starring Liam Neeson had opened on Feb. 28, and I gave the students the daily domestic box-office gross values from opening day through March 16, or 17 days of data. The students were asked to make a time plot of these box-office figures (see Figure 3) and, after examining various trend models, get a forecast for the cumulative domestic box-office gross for a target date, midterm day, March 31. I knew that two days later (after I had graded their exams and returned them), Universal Studios would give the actual cumulative domestic gross of the film as of March 31. It was $85.39 million. Of the various trend models we covered, the Weibull curve yielded the most accurate forecast, $86.11 million; another model was reasonably close, and the others we discussed and they tried were way off.
|Figure 1: http://www.cnn.com/interactive/2013/09/health/map-obamacare/.|
|Figure 2: http://www.whitehouse.gov/the-press-office/2014/04/17/fact-sheet-affordable-care-act-numbers.|
|Figure 3: Initial daily domestic box-office gross of the motion picture (“Non-Stop”).|
Categorizing the Forecast Software
Commercial forecasting software is available in two broad categories. Using the nomenclature from previous OR/MS Today forecasting surveys, the first category is called dedicated software. A dedicated product implies that the software only has various forecasting capabilities, such as Box-Jenkins, exponential smoothing, trend analysis, regression and other procedures. The second category is called general statistical software. This implies the product does have forecasting techniques as a subset of the many statistical procedures it can do. Thus, a product that can do ANOVA, factor analysis, etc., as well as Box-Jenkins techniques would fall into this group. In recent years, the number of products in the second category has been growing, as statistical software firms have been adding additional and more sophisticated forecasting methodologies to their lists of features and capabilities. However, some dedicated software manufacturers offer specific capabilities and features (e.g., transfer function, econometric models, etc.) that general statistical programs may not have.
In both software categories, forecasting software varies when it comes to the degree to which the software can find the appropriate model and the optimal parameters of that model. For example, Winters’ method requires values for three smoothing constants and Box-Jenkins models have to be specified with various parameters, such as ARIMA(1,0,1)x(0,1,2). Forecasting software vary in their degree to find these parameters. For the purposes of this and previous surveys, the ability of the software to find the optimal model and parameters for the data is characterized. Software is labeled as automatic if it both recommends the appropriate model to use on a particular data set and finds the optimal parameters for that model. Automatic software typically asks the user to specify some parameter to minimize (e.g., Akaike Information Criterion (AIC), Schwarz Bayesian Information Criterion (SBIC), RMSE, etc.) and recommends a forecast model for the data, gives the model’s optimal parameters, calculates forecasts for a user-specified number of future periods, and gives various summary statistics and graphs. The user can manually overrule the recommended model and choose another, and the software finds the optimal parameters, forecasts, etc., for that one.
The second category is called semi-automatic. Such software asks the user to pick a forecasting model from a menu and some statistic to minimize, and the program then finds the optimal parameters for that model, the forecasts, and various graphs and statistics.
The third category is called manual software. Here the user must specify both the model that should be used and the corresponding parameters. The software then finds the forecasts, summary statistics and charts. If you frequently need to make forecasts of different types of time series, using manual software could be a tedious choice. Unfortunately, that broad advice may not be apropos for some software. Some products fall into two categories. For example, if you choose a Box-Jenkins model, the software may find the optimal parameters for that model, but if you specify that Winters’ method be used, the product may require that you manually enter the three smoothing constants.
When it comes to analyzing trends, most the products I tried fall into the semi-automatic group. That is, I need to choose a trend curve, and the software finds the appropriate parameters for that model, gives forecasts, summary statistics and graphs.
|Figure 4. IBM SPSS input worksheet (showing the “Non-Stop” movie daily box-office returns).|
|Figure 5: IBM SPSS’ results of “automatic” forecasting of the “Non-Stop” data.|
|Figure 6: IBM SPSS’ fitted models for three specified growth curves.|
|Figure 7: IBM SPSS’ plot of the data and growth curves.|
Working with a Sample of Products
In my class, students use StatTools, part of the Palisade Software Suite that comes with their textbook. Its forecasting capabilities are regression, exponential smoothing (Brown, Holt and Winters’) and moving averages. If data followed some nonlinear function, the students could make mathematical transformations to make the data linear and then use ordinary linear regression on it, and do the inverse transformation to get the forecast. They also have several Excel templates I developed (Gompertz, Pearl-Reed, Weibull, etc.) for the course. For this article, I tried a small sample of professional products from different categories, specifically Minitab, IBM SPSS and NCSS on the “Non-Stop” movie data. IBM SPSS falls into the automatic forecasting category; Minitab and NCSS are semiautomatic products. A caveat: This is not meant to be a critical review of any product mentioned.
I let IBM SPSS first do the analysis of the movie data via its automatic mode, called “Expert Modeler” (i.e., choose the model and its parameters and get the forecasts). Figure 4 shows superimposed screen shots of IBM SPSS’ worksheet, showing the “Non-Stop” daily domestic box-office gross and the menu system to start the automatic forecasting procedure. The program then gave its recommended model, Brown’s method for data with linear trend, which uses one smoothing constant to estimate the intercept and slope of the fitted line (as compared to Holt’s method, which uses two independent smoothing constants) . IBM SPSS’ accompanying statistics, forecast plot and additional output are shown in Figure 5.
IBM SPSS does have a curve fitting feature, so I utilized it and specified three possible models to be examined – the linear, growth and logistic curves. Figures 6 and 7 give the resulting output and plots for these choices.
NCSS has, in addition to the standard forecasting procedures (Box-Jenkins and exponential smoothing models), an extensive list of more than 20 nonlinear curve models under its menu label “Growth and Other Models.” The user chooses a model, and NCSS finds the appropriate parameters for the particular data set.
I chose, for the “Non-Stop” data, the “Logistic(4)” model [i.e., a logistic curve with four parameters; there is a Logistic(3) model available as well], and Figure 8 shows the NCSS’ output.
Minitab is a hybrid of a semi-automatic and manual forecasting product. If you specify that a Box-Jenkins model be used, the software finds the appropriate parameters for the model. However, if you choose Winters’ method, Minitab requires that you manually enter values for the three smoothing constants. Minitab also has, under the Time Series choice on the main menu, a Trend Analysis option. Choosing that gives the user four possible curves (linear, quadratic, exponential and Pearl-Reed logistic). Figure 9 gives the results of my choice for the “Non-Stop” data, the Pearl-Reed curve (Minitab calls it the S-Curve Trend Model).
Finally, Figure 10 shows the results of one of my Excel templates that uses the four-parameter Weibull trend curve and uses Solver’s nonlinear programming capability to find the optimal parameters that minimizes the root mean square error for the entered data.
|Figure 8: NCSS’ output. I chose the “Logistic(4)” from NCSS’ list of “Growth and Other Models.”|
|Figure 9: Minitab’s output for the Pearl-Reed logistic growth model for the Non-Stop data.|
|Figure 10: The four-parameter Weibull curve fit for the Non-Stop data.|
We e-mailed the vendors and asked them to respond on our online questionnaire so readers could see the features and capabilities of the software. The purpose of the survey is to inform the reader of a program’s forecasting capabilities and features. We tried to identify as many forecasting vendors and products as possible and contacted all the vendors that we identified and/or responded to the last survey in 2012. For those who did not respond, we tried gentle reminders (several e-mails and some phone calls). In addition to the features and capability of the software, we wanted to know what techniques or enhancements have been added to the software since our previous survey. The information comes from the vendors, and we made no attempt to verify what they gave us.
If you use data to make forecasts, what should you look for in a vendor and the product? First, find out the capabilities of the software. Specifically, what forecasting methodologies can the product do? Does it find the optimal parameters of the procedure for your particular data set or must you manually enter those values? How extensive, useful and clear is the output?
Most, but not all, vendors allow you to download a time-trial version of the software that typically expires in anywhere from a week to a month. Ideally, the trial version should allow you to work with your own data and not just “canned” data that the vendor bundles with the trial software. Verify if the trial version has size limitations of the data, and if so, are they overly restrictive.
Ask about technical support, updating to a newer version when it is released and differences (if any) depending on the operating system you are using. Contact the vendor with your specific questions. Users tell me, and I have independently found, that most vendors have good and helpful technical support before and after you buy.
Jack Yurkiewicz (email@example.com) is a professor of management science in the MBA program at the Lubin School of Business, Pace University, New York. He teaches data analysis, management science and operations management. His current interests include developing and assessing the effectiveness of distance-learning courses for these topics. He is a longtime member of INFORMS.
Survey Data & Directory
To view the survey results as well as a directory of vendors who participated in the survey, click here.