Software Survey: Forecasting at Steady State?
Perhaps, but ubiquitous software continues to play a crucial role in many aspects of life.
By Jack Yurkiewicz
Many professional and casual users do explanatory and time series forecasting in medicine, business and academia. The forecast “hits” and “misses,” particularly the latter, sometimes make headlines. The large increase in college applications for the class of 2012, especially to prestigious universities, caught some provosts by surprise. Box office returns for “Juno” (on the high side) and “Leatherheads” and “Stop-Loss” (who under-performed) had repercussions in the movie industry. Doctors simultaneously promulgate and de-emphasize the charting of homocysteine levels to help predict the risk of coronary heart disease. Watching the approval numbers as a function of time for Clinton vs. Obama vs. McCain has become a mainstay of cable news networks. The trend curves for the price of oil, and finding explanatory models for said prices, are subjects of discussion in academia, on Wall Street and at many dinner tables.
In preparing this, an analogous example arose. Has the study of and the software for forecasting reached steady state? In trying to answer the first part of that question, a look at Amazon.com shows only two forecasting texts with copyright dates after 2006. BarnesandNoble.com shows four. As for the latter part of the question, more statistical software programs have indeed released updates since then, but not many claim that the enhancements involved forecasting capabilities or features, and few new dedicated forecasting programs have been identified. This survey tries to answer the latter part of the posed question. That is, has the development of forecasting software reached the plateau stage of a Gompertz curve? While the arguable answer seems affirmative, what is almost surely not debatable is that forecasting plays a crucial role in many aspects of life.
Practitioners use forecasting software emanating from two camps. The first source is the stand-alone, dedicated forecast product, such as Forecast Pro, Autobox and others, which just does a variety of forecasting procedures. Typically, these include regression, Box-Jenkins models, exponential smoothing models by Brown, Holt, Winters, etc. The second is the general statistical software product, such as SPSS, Minitab, SAS, Systat, Statgraphics, NCSS and others, which include forecasting as part of the many statistical techniques available. There are two main reasons why a practitioner may want to buy and use a dedicated forecast program over a general statistics product. First, some dedicated forecast programs may have specific techniques that the general statistics program may not. These include state space smoothing algorithms, econometric models, transfer function models and others. The second is that some dedicated forecasting products offer a higher level of “automation,” which translates into ease-ofuse, than the general statistics program group. This degree of automation has benefits and caveats for users, and we will mention a few of these later.
We can delineate forecasting software into three categories. We call the first automatic forecasting software. Automatic software analyzes the data, makes a recommendation (accompanied by a statistical reason for why the product made the recommendation) of a forecasting procedure or model, optimizes the parameters of the model, and gives forecasts, plots and various statistical summary measures. The user can accept the recommendation or reject it. If the latter, then the user chooses an alternative model or technique, and the software optimizes the parameters, gives the resulting forecasts, plots, statistical summary measures, etc. For example, Forecast Pro, a dedicated forecasting product, typically starts in its default “Expert Selection” mode. Figure 1 shows the program’s analysis of an airline’s enplanement data.
With this high ease-of-use level come potential pitfalls. The casual or untrained user may come to rely on the software as a forecasting black box. Thus, from Figure 1, if the inexperienced user does not know what the Box-Jenkins ARIMA(0,1,1)*(0,1,1) model is, what are its assumptions, etc., but naively takes the forecasts because the software recommended this procedure as appropriate, then he or she may be inviting criticism.
Automatic forecasting is more likely found in the dedicated products. However, a few general statistical products (e.g., SPSS with its Trends add-on, Statgraphics) do forecasting in the automatic mode.
The second software category is called semiautomatic. Here the user must specify the model or technique, and the software will find the optimal parameters for that model and display the resulting forecasts and ancillary output. Some general statistics programs (e.g., NCSS) and most dedicated forecasting products are semiautomatic. Thus, a user must have knowledge of the various forecasting procedures if he or she is to use semiautomatic software. The calculations to find the parameters of the designated model (e.g., the three smoothing constants for Winters’ method) which minimizes some statistical measure, such as Schwarz’ Bayesian Information Criterion (BIC), should be left to the software.
The last group is manual software. Here the user must specify both the model and the parameters for that model. Many general statistics programs fall into this category. Clearly, the major drawback in using manual software is determining the optimal parameters for the model chosen. For example, my students were working with a time series of monthly (Oct. 1997 through Sept. 2007) airline enplanement data from a particular carrier. Figure 2 shows the Excel time plot of this data.
The data exhibited monthly seasonality and a small upward trend, and my students recommended Winters’ method to make the forecasts. With manual software, zeroing in on the appropriate smoothing constants while minimizing the Akaike Information Criterion (AIC) became a tedious trial and error process. Figure 3 shows Systat’s Dialog Box for Winters’ method, indicative of what is seen in manual software.
The current versions of many products differ in their flexibility. Some products allow the user to withhold a portion of the data for the model fit and do a validation for the remainder. Experienced forecasting users may want the software to perform statistical tests on the within-sample errors and show various statistics for the out-of-sample errors. Certain products allow the user to specify which statistical parameter (mean square error, AIC, BIC, etc.) should be minimized to find the optimal parameters of the model, some don’t give the user the choice, and a few do not even explicitly tell the user (without resorting to a search in the documentation) which statistic is minimized.
Software differs when comparing the output. Ideally, the experienced user would like to know the various summary statistics of the model (MSE, RMSE, MPE, MAPE, AIC, BIC, Ljung-Box statistic, etc.). Some products give these and others, while some give many fewer. All products give time plots of the data, with or without the forecasts. However, if the data set is large, the resulting graph may have a confusing “squished” look to the data and the software may not give the user the flexibility of adjusting the view. Figure 4 shows Minitab’s default output of the enplanement data, which can, with some effort, be modified to make the display easier to interpret.
In informal testing, I have found that different programs routinely give different forecasts for the same data, even when using the same model and the same parameters. Using a sample of a few software programs I own, I specified Winters’ method as the forecasting technique on the enplanement data. If a product was semiautomatic (Forecast Pro, NCSS, Statgraphics), I let it find the optimal smoothing constants, and if a program was manual (Minitab, Systat), I specified the smoothing constants. Where possible, I indicated that RMSE should be minimized. I also used an Excel template I wrote that does Winters’ method and uses Solver to find the smoothing parameters to minimize RMSE. Table 1 summarizes the monthly forecasts for the subsequent year from the various products. The Auto column shows the forecasts found when the software automatically found the three optimal smoothing parameters, while Manual gives the forecasts using the three parameters I specifically indicated (which came from my template and Solver’s answer). The table also gives the RMSE for the different models. While the forecast differences were, for most products, not substantial, they were indeed different. In addition, this was well-behaved data, following almost a “textbook” pattern. On messier data, the comparative results differed more dramatically. The reasons can vary, from which statistic the software is trying to minimize, to how the software gets the initial conditions for the recursive procedure (i.e., the initial estimates for the intercept and slope of the trend line and the 12 seasonal indices for Winters’ method). Unfortunately, very few forecasting products address these initial conditions in their documentation, and the user has no idea what the assumptions are.
I show this comparison not to denigrate or laud any product, but to point out that forecasts may be a function of the software, and the user should be aware of this.
Based on reader and vendor responses, we have tried to make this survey as comprehensive as possible. However, finding all forecasting programs or all statistics products that do forecasting is problematic. We tried to identify as many products as possible, using reader and vendor feedback, advertising, displays at professional conferences, information from previous surveys, etc. We e-mailed the vendors and asked them to respond on our on-line questionnaire. The survey asked for the capabilities and features of the software and allowed the vendor to include additional details not addressed by the questions. If a vendor failed to respond, we followed up with additional e-mails and, at times, one or two cajoling telephone calls. To those who feel slighted by our omission, please accept our apology. Let us know of the company and product, and we will include it in the online version of this survey.
The purpose of this survey is to inform the reader of what is available; it does not purport to rate or review these products. The information comes from the manufacturers and no effort was made to verify the submissions.
If you are interested in getting a forecasting program, or want to try something different, I recommend that you first look at the techniques that the software can do. Next, determine the level of automation of the product. There are other issues, such as its flexibility, the quality and quantity of the output, the ease-of-learning and the ease-of-use of the software. These are much harder to judge. The best way is to try the software, but this may be difficult. Check if the vendor has a trial version to download; unfortunately, most do not. Sometimes vendors make a “student” version available, at a greatly reduced price, for academics to try. Finally, contact the vendor with your specific questions. Users tell us that vendors are helpful and want to satisfy them with their selection.
Jack Yurkiewicz (email@example.com) is a professor of management science in the MBA program at the Lubin School of Business, Pace University, New York. Besides management science, he teaches business statistics, operations management and forecasting. His current interests include developing distance-learning courses for these topics and assessing their effectiveness.
To view the survey data, go to: http://www.lionhrtpub.com/orms/surveys/FSS/fss-fr.html
ANALYTICAL SOFTWARE SURVEYS
OR/MS Today (www.lionhrtpub.com/ORMS.shtml), the membership magazine of the Institute for Operations Research and the Management Sciences (INFORMS), regularly conducts surveys of software of interest to a broad spectrum of analysts. Most of the surveys are updated on a biennial basis.
Each survey includes a directory of vendors and side-by-side comparisons of software packages, including such metrics as system requirements, performance capabilities, key features, technical support and vendor comments.
Several software surveys are available online, including:
Vehicle Routing (February 2010)
Simulation (October 2009)
Linear Programming (June 2009)
Statistical Analysis (February 2009)
Decision Analysis (October 2008)
- 51We were repeatedly reminded several times last year that variability can confound statistical predictions and unlikely events do occur. Upsets in sports and politics are always news, since having the underdog beat the “sure thing” is surprising and noteworthy. What is exciting in sports is unexpected in politics, since we…
- 42It’s long been popular to talk about customer interaction data such as clickstream, social activity, inbound email and call center verbatims as “unstructured data.” Wikipedia says of the term that it “…refers to information that either does not have a pre-defined data model or is not organized in a pre-defined…
- 39NEW: Statistical 2/9/17 Decision Analysis - 10/10/16 Forecasting - 6/9/16 Vehicle Routing - 2/12/16 Simulation - 10/5/15 Linear Programming - 6/8/15 Spreadsheet Add-ins - 8/13/10 Supply Chain Management - 6/14/03 Nonlinear Programming - 6/2/98 Lionheart Publishing, Inc., publishes a number of software surveys each year in conjunction with one of…
- 38Many organizations have noticed that the data they own and how they use it can make them different than others to innovate, to compete better and to stay in business. That’s why organizations try to collect and process as much data as possible, transform it into meaningful information with data-driven…
- 34January/February Social media, marketing & analytics, v. 4.0 Beyond SaaS: infrastructure, platform as a service Talent shortage: in search of deep analytical skills March/April Software survey: statistical analysis Data revolution: AI and machine learning IoT: devices, connectivity, IT and more May/June Cognitive computing: what’s next? Data quality: cleaning up messy…