Software Survey: Forecasting an upward trend?
Survey of forecasting software reveals interesting trends and new developments.
By Jack Yurkiewicz
Time series data is frequently collected and analyzed, and then forecasts are made, often with clever headlines or leads. These vary from the important to the mundane. Recently, we have seen time series data and forecasts for unemployment figures, gas prices, mortgage loan rates, stock indices, approval values for President Obama vs. Mitt Romney, high school science and math scores, television ratings for “Dancing With the Stars” and many other examples.
Television and print commentators frequently show the numbers or graphs, draw some general conclusion or prediction, but frequently they do not delve into some underlying issues, such as sample size, likely voters vs. registered voters, correlations with other factors, forecasts are “long-range” and others. Many statistical and forecasting software products will automatically make forecasts for time series data, but did the forecaster know the specifics of the model chosen, examine previous forecasts made on earlier data or try alternative models? We presume the answer is “absolutely,” but sometimes when we read or hear about the forecasts, we wonder.
For the purposes of this article, I tried making some forecasts on data that is far less “weighty” than many of the examples given above. I collected  monthly data of the total domestic motion picture box-office grosses, from January 2000 through December 2011, or 144 values, and put these into an Excel spreadsheet. Each month had just three columns – time period, date (1/1/2000, 2/1/2000, etc., which Excel then displayed as Jan-00, Feb-00, etc.) and the total gross (the numbers are in millions of dollars and have not been adjusted for inflation). The plan was to import the spreadsheet into several statistical or forecasting software products to make forecasts of the total box-office values for January through April 2012, and then compare those forecasts to the actual values for those four months, in time for the deadline for this article. A time series plot of the historical data is shown in Figure 1.
Figure 1: Total monthly domestic motion picture box-office gross sales 2000-2011.
Types of Forecasting Software
Forecasting software is generally available from two categorical groups. The first group, called dedicated software, has forecasting capabilities but does not possess additional statistical prowess. Thus, dedicated software typically can do Box-Jenkins, exponential smoothing, regression, nonlinear trend analysis and other forecasting procedures, but it cannot find a confidence interval for the population proportion or do a factor analysis. An example of dedicated software is Forecast Pro. The second group consists of general statistical analysis products that include forecasting capabilities. Some examples are IBM SPSS Statistics, SAS, Minitab, Statgraphics, NCSS and Systat. One possible advantage of dedicated software is that it may have certain procedures or capabilities (e.g., ARIMA intervention, econometric, transfer function models, etc.) that the general statistical products might not.
Another attribute of forecasting software is its level of automation; that is, the degree to which the software can specify the appropriate forecasting model to use on your data. There are three levels of automation. The first may be called automatic. Automatic software advises the “appropriate” model for the particular data set. That is, it recommends a forecasting model or procedure by minimizing some statistic (e.g., Akaike Information Criterion (AIC), Schwarz Bayesian Information Criterion (SBIC), mean square error (MSE), etc.) and then finds the optimal parameters for the model, calculates forecasts and confidence intervals, gives various summary statistics and makes graphs. The user can override the recommended procedure, specify some other forecasting technique, and the software then finds the optimal parameters for that model, gets the forecasts, etc. Most dedicated forecasting products can operate in the automatic mode; fewer general statistical products (e.g., IBM SPSS Statistics and Statgraphics) can be put into this category.
The next automation level is semi-automatic. Here, the user specifies the particular forecasting methodology the software should use, for example, a Box-Jenkins model, and the software proceeds to find the optimal parameters of that model, the forecasts, summary statistics, graphs, etc. Most general statistical products operate in the semi-automatic mode.
Finally, the third level of automation, dubbed manual, requires the user to specify both the model and the parameters of that model. Thus, for manual software, if the user specifies Winters’ multiplicative model, he or she must also enter the three smoothing constants. The software will then give the forecasts, plots, summary statistics, etc. After examining the summary statistics, forecasts, etc., the user manually enters new model parameters and repeats the process. Finding the “optimal” model parameters can thus become a tedious trial-and-error process. The standard advice still holds: If you frequently forecasting time series data, you should consider using an automatic or semi-automatic product.
Working With the Software
This article is not meant to formally review any product, but I used the monthly box-office data in the latest versions of SPSS, Forecast Pro, Statgraphics, NCSS, Minitab and Systat, all of which I have used for some time. These products represent a cross-section of the three automation categories described. All the products read and imported my Excel spreadsheet, usually but not always, without additional tweaking (Forecast Pro needs, in separate cells, information about time span, seasonality, etc.).
All the programs I tried were easy to use. Menu systems are clear (I adhered to the time honored adage of not clicking on Help unless and until it was a last resort) and the output was easy to read and interpret. If the software had an automatic mode, I always used that first. Looking at the monthly box-office data in Figure 1, repetitive “peaks” and “valleys” appeared over the years. The data shows high box-office values from May through July, and secondary peaks in November and December. Low box-office months are typically in February and September. These rules, off course, do not always apply (e.g., December 2009), but there appears to be monthly seasonality and an upward trend to the numbers. Thus, a Box-Jenkins or a Winters’ exponential smoothing model could be two methodologies to apply to this data. Forecast Pro recommended Winters’ method with additive seasonality (see Figure 2). When I overruled that advice and asked for a Box-Jenkins model, it recommended an ARIMA(1,0,1)x(0,1,2) model (see Figure 3). IBM SPSS Statisitics recommended an ARIMA(2,0,12) model. The third automatic product, Statgraphics, after I told it to analyze all the models available and find the one with the smallest AIC, had its StatAdvisor recommend an ARIMA (0,1,1)x(2,1,2)12 model in which “a multiplicative seasonal adjustment was applied” (see Figure 4).
Figure 2: Forecast Pro’s recommendation.
Figure 3: Overriding Forecast Pro’s recommendation and specifying a Box-Jenkins model instead.
Figure 4: Statgraphics’s StatAdvisor making recommendation for the appropriate models to use. Only the “top 2” choices are shown here. It also gives the parameters, the forecasts and some specifics of the calculations for the optimal model (not shown).
Forecast Results and a Caveat
As mentioned in previous forecasting surveys, a particular data set could result in automatic or semi-automatic software recommending different models and/or different parameters for those models. Figure 5 shows the box-office forecasts obtained from some of these products. Because SPSS and Statgraphics both recommended Box-Jenkins models, I told Forecast Pro to utilize its recommended procedure, an exponential smoothing model, and then find the appropriate Box-Jenkins model and make forecasts for both. As NCSS does not make a model recommendation, I specified it should use the Box-Jenkins procedure and find the appropriate parameters. The last column shows the actual box-office totals for the first four months of 2012. For March 2012, all the products gave forecasts that were much lower than the actual box-office figure. Perhaps “The Hunger Games,” released on March 23 and which grossed $233 million in just those nine days of March, may have something to do with those low projections.
Figure 5: Box-office grosses; forecasts from a sample of automatic and semi-automatic software and the actual grosses, in millions of dollars.
On May 15, ABC News reported, “Today, just over a third of U.S. adults are obese. By 2030, 42 percent will be, says a forecast released Monday. That’s not nearly as many as experts had predicted before the once-rapid rises in obesity rates began leveling off” . The article intimated that the Centers for Disease Control and Prevention had made the forecast. Thus, we are getting a forecast of the fraction of Americans who will be considered obese 18 years from now, and that’s a lower fraction that was forecast who knows when. In my box-office example, all I wanted to do is to make box-office forecasts four months into the future, and our forecasting software substantially underestimated the box-office grosses that would occur in month three. Perhaps when we read about such long-range (and even short range) forecasts, or contemplate making them ourselves, we might pause and think “The Hunger Games.”
For this year’s forecasting software survey, as in the past, we tried to identify as many forecasting products as possible. We e-mailed the vendors and asked them to respond to our online questionnaire so readers could see the features and capabilities of the software. The purpose of the survey is to inform the reader of what is available. The information comes from the vendors, and no attempt was made to verify the information they gave us. Inevitably, after the results are published, we hear, “How could they have left out (my) product X!” Thus, if we did make an omission, we ask you to please accept our apology for the oversight. Let us know of the company and product and we will add it to online survey results.
If you are interested in getting a new forecasting program, or just want to try some product other than the one you have, you should first look at the techniques the software offers and compare those with your needs. Most, but not all, vendors allow you to download a time-trial version of the software, which typically expires in anywhere from a week to a month. Make sure the trial version allows you to work with your own data and check if any forecasting features or niceties (typically the data size is one) are omitted in the trial version. Contact the vendor with your specific questions. Users tell us that they found the vendors to be extremely helpful.
Jack Yurkiewicz (firstname.lastname@example.org) is a professor of management science in the MBA program at the Lubin School of Business, Pace University, New York. He teaches data analysis, management science and operations management. His current interests include developing and assessing the effectiveness of distance-learning courses for these topics. He is a senior INFORMS member.
- The authoritative Boxoffice.com website, followed by many in and out of the motion picture industry: www.boxoffice.com/statistics/yearly.
Survey Results & Directory
For the results of the 2012 forecasting software survey and a directory of forecasting software vendors, click here.