Share with your friends


Analytics Magazine

Statistical Analysis Software Survey: The joys and perils of statistics

Trends, developments and what the past year of sports and politics taught us about variability and statistical predictions.

By James J. Swain

“It is difficult to make predictions, especially about the future.”

– Danish saying, variously attributed to Niels Bohr or Yogi Berra

We were repeatedly reminded several times last year that variability can confound statistical predictions and unlikely events do occur. Upsets in sports and politics are always news, since having the underdog beat the “sure thing” is surprising and noteworthy. What is exciting in sports is unexpected in politics, since we expect our predictions to do better when the business is serious. We certainly don’t expect to see another “Dewey Wins!” headline, but both the Brexit vote and Trump’s election clearly confounded consensus predictions. In the latter case, the actual margins in several key states were very small – but in politics as in sports a win is a win.

It was also noteworthy that while data-savvy campaign teams seemed to be the story in the previous election cycle, Trump’s campaign seemed to demonstrate that they weren’t essential. The savvy predictions may have been correct, yet an 80 percent chance of winning is not a certainty, and the less likely outcome is still possible.

Upsets in statistical prediction was not the only big story in statistics this year. The inability of researchers to replicate published experiments in several fields, such as psychology, have called published experimental results into question. It has also led to revisions in thinking about the old standby, the p-value. For instance, in one study of 100 articles in top psychology journals, only about 36 percent of the significant results were successfully replicated. Last May the American Statistical Association issued a statement condemning the use of any single measure, such as p-values, as a substitute for scientific reasoning. One journal, Basic and Applied Social Psychology, has eliminated their use altogether.

Problems with an over reliance on the p-value have been known for years. In traditional hypothesis testing, the p-value is the probability of observing a statistic of the value (or larger) than the observed statistics under the null hypothesis. The null hypothesis is rejected when the p-value is sufficiently small, under the assumption that the alternative is the more likely explanation. Of course, in any large number of experiments, a “significant” result (i.e., one with a low p-value) is increasingly likely to occur, as quantified by the Bonferroni inequality. That is why running many experiments and reporting only the “significant” ones distorts the actual p-value.

statistical prediction, p-values, statistical software vendors, inference and estimation, forecasting

The goal in any statistical investigation is to bring forth some insight from the data.
Photo Courtesy of | Thananit Suntiviriyanon

One way to deal with the uncertainty with what p-value means is through experimental replication, which can either confirm the noteworthy result or fail to do so. In the latter case, the lack of significant result in the replication suggests that the first was simply a “false positive.” Since journals generally prefer novel results to replication of existing results, there is little incentive for independent replication.

Software for Statistics

The goal in any statistical investigation is to bring some insight forth from the data, whether confirmation of a research hypothesis, or the reassurance that some process is still ticking along at the proper precision and regularity, or in building a usable model. To obtain these useful results, software must be able to perform a variety of functions including data acquisition and editing, presentation of results or relations among variables, transformations as needed, and computations to support the analysis.

Computers were once human, as the recent hit film “Hidden Figures” illustrates. At Langley, the best computers were prized for their insight into the underlying analysis and physical processes as well as computations [1]. The best modern software should provide the same assistance, both the computations that we choose, as well as further tools to enable further analysis that are suggested by analysis. The investigation is usually iterative, using one result to suggest alternative approaches and further experiments.

Software will also include the ability to compute critical values from the reference sampling distributions such as the normal, t and F, from which p-values (for instance) can be computed. In fact, many of our critical mathematical and statistical tables were first computed by human computers in the early part of last century. This is noted in another book about human computers, “When Computers were Human” [2].

Software offers more than simply computations. Exploratory analysis was in part designed to generate quick pictures of the data that could be assembled quickly and by hand – dot plots, stem-and-leaf and the box plot, for instance, minimizing complexity of computation for insight. Increasingly, multiple plots are provided in arrays or at the margins of other plots. For instance, box plots or histograms display the marginal distributions while the central plot provides the scatter plot. In multivariate investigations, a two-dimensional array of two-dimensional scatter plots helps the analyst visualize higher dimensional relationships. The best software provides the interactive ability to manipulate plots interactively to identify points or sets of points that are noteworthy (e.g., outliers) or to transform the variables within a graph. This is a particular strength of the JMP software.

Software provides a greatly enhanced range of graphical displays. Graphics are an excellent way to visualize data – to see distributions and commonalities across variables or in location. Data can also be summarized geographically. A recent popular interest article in The New York Times is representative of the possibilities. In the 2016 presidential election results, voting for Donald Trump was more highly correlated with certain popular television shows than with presidential voting in the last election. The cultural divide remarked upon during the election was paralleled with selections from among 50 television shows across the counties of the United States and then correlated to election results. The correlation is more easily understood graphically than numerically [3].

Finally, good statistical software can assist in the design of experiments. A good analysis, often in the context of the old PDCA cycle of “plan, do, check and act” begins with a question and a plan for the collection of experimental data. Software can be used to assist in sample size computations through power analysis, or provide specialized designs for a range of designs in one or more variables.

Modern software has the additional advantage that it opens analysis to a wider circle of individuals who would not be able to perform the analyses themselves. Since computations are less of a requirement, introductions to statistics are available to a wide array of individuals. The American Statistical Association sponsors teacher clinics for classes and poster competitions at the K-12 level, and AP statistics courses are growing quickly as well.

Software Survey Products

The biennial statistical software products surveyed this year provides capsule information about 19 products selected from 13 vendors. The tools range from general tools that cover the important techniques of inference and estimation, as well as specialized activities such as nonlinear regression, forecasting and design of experiments. The product information contained in the survey was obtained from product vendors and is summarized in tables to highlight general features, capabilities, computing requirements, and to provide contact information. Many of the vendors have their own websites for further, detailed information, and many provide demonstration programs that can be downloaded from these sites. No attempt was made to evaluate or rank the products, and the information provided comes from the vendors themselves. The survey data is available online (see Editor’s Note). Vendors that were unable to make the original publishing deadline are added to the online survey as they complete the online questionnaire.

Products that provide statistical add-ins available for use with spreadsheets remain popular and provide enhanced specialized capabilities for spreadsheets. The spreadsheet is the primary computational tool in a wide variety of settings, familiar and accessible to all. Many procedures of data summarization, estimation, inference, basic graphics and even regression modeling can be added to spreadsheets in this way. An example is the Unistat add-in for Excel. The functionality of products for use with spreadsheets continues to grow, including risk analysis and Monte Carlo sampling, such as Oracle Crystal Ball.

Dedicated general and special purpose statistical software generally have a wider variety and depth of analysis than available in the add-in software. For many specialized techniques such as forecasting, design of experiments and so forth, a statistical package is appropriate. In general, statistical software plays a distinct role on the analyst’s desktop, and provided that data can be freely exchanged among applications, each part of an analysis can be made with the most appropriate (or convenient) software tool.

An important feature of statistical programs is the importation of data from as many sources as possible, to eliminate the need for data entry when data is already available from another source. Most programs have the ability to read from spreadsheets and selected data storage formats. Within the survey we observe several specialized products, such as STAT::FIT, which are more narrowly focused on distribution fitting than general statistics, but of particular use to developers of models for stochastic systems, reliability and risk.

James J. Swain ( is professor in the Department of Industrial and Systems and Engineering Management at the University of Alabama in Huntsville. He is a longtime member of INFORMS, as well as ASA, IIE and ASEE.

Editor’s note:
Survey Directory & Data
To view the statistical software survey products and results, along with a directory of statistical software vendors, click here.


  1. Margot Lee Shetterly, 2016, “Hidden Figures,” William Morrow.
  2. David Alan Grier, 2005, “When Computers Were Human,” Princeton University Press.
  3. Josh Katz, 2016, “‘Duck Dynasty’ vs. ‘Modern Family’: 50 Maps of the U.S. Cultural Divide,” The New York Times, The Upshot, Dec. 27. Available online at:

business analytics news and articles


Related Posts

  • 69
     NEW: Forecasting 6/9/16   Vehicle Routing - 2/17/18 Simulation - 10/8/17 Linear Programming - 6/13/17 Statistical - 2/9/17 Decision Analysis - 10/10/16 Spreadsheet Add-ins - 8/13/10 Supply Chain Management - 6/14/03 Nonlinear Programming - 6/2/98 INFORMS publishes a number of software surveys each year in conjunction with one of its…
    Tags: surveys, software, statistical, analysis, forecasting
  • 51
    Many professional and casual users do explanatory and time series forecasting in medicine, business and academia. The forecast “hits” and “misses,” particularly the latter, sometimes make headlines. The large increase in college applications for the class of 2012, especially to prestigious universities, caught some provosts by surprise. Box office returns…
    Tags: software, forecasting, data, products, statistical
  • 51
    Fifty years have elapsed since the founding of the field of decision analysis by Howard Raiffa and Ron Howard. 2016 is not only a milestone year due to the anniversary but also because it marks the passing of one of the founders, Howard Raiffa. Such significant events make this year…
    Tags: analysis, survey, products, vendors, software
  • 44
    January/February Cybersecurity: new threats, new solutions The IOT and related, hidden security risks Can analytics save U.S. healthcare system? March/April Supply chain advances and solutions Software survey: vehicle routing Capitalizing on AI & machine learning May/June Social media, marketing & analytics Real-time customer personalization Next generation revenue management July/August Software…
    Tags: software, survey, data, forecasting, analysis
  • 42
    FEATURES Why large, once-successful companies fail By Gary Cokins The reasons are numerous, but the world’s problems are more complex than ever, creating the need for analytics. Actionable Analytics By Krishna Rupanagunta, Meena Anantha Padmanabhan and Vinay Mony How to convert thought into action and bridge the gap between analytics…
    Tags: analysis, software, survey, statistical


Using machine learning and optimization to improve refugee integration

Andrew C. Trapp, a professor at the Foisie Business School at Worcester Polytechnic Institute (WPI), received a $320,000 National Science Foundation (NSF) grant to develop a computational tool to help humanitarian aid organizations significantly improve refugees’ chances of successfully resettling and integrating into a new country. Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. Read more →

Gartner releases Healthcare Supply Chain Top 25 rankings

Gartner, Inc. has released its 10th annual Healthcare Supply Chain Top 25 ranking. The rankings recognize organizations across the healthcare value chain that demonstrate leadership in improving human life at sustainable costs. “Healthcare supply chains today face a multitude of challenges: increasing cost pressures and patient expectations, as well as the need to keep up with rapid technology advancement, to name just a few,” says Stephen Meyer, senior director at Gartner. Read more →

Meet CIMON, the first AI-powered astronaut assistant

CIMON, the world’s first artificial intelligence-enabled astronaut assistant, made its debut aboard the International Space Station. The ISS’s newest crew member, developed and built in Germany, was called into action on Nov. 15 with the command, “Wake up, CIMON!,” by German ESA astronaut Alexander Gerst, who has been living and working on the ISS since June 8. Read more →



INFORMS Computing Society Conference
Jan. 6-8, 2019; Knoxville, Tenn.

INFORMS Conference on Business Analytics & Operations Research
April 14-16, 2019; Austin, Texas

INFORMS International Conference
June 9-12, 2019; Cancun, Mexico

INFORMS Marketing Science Conference
June 20-22; Rome, Italy

INFORMS Applied Probability Conference
July 2-4, 2019; Brisbane, Australia

INFORMS Healthcare Conference
July 27-29, 2019; Boston, Mass.

2019 INFORMS Annual Meeting
Oct. 20-23, 2019; Seattle, Wash.

Winter Simulation Conference
Dec. 8-11, 2019: National Harbor, Md.


Advancing the Analytics-Driven Organization
Jan. 28–31, 2019, 1 p.m.– 5 p.m. (live online)


CAP® Exam computer-based testing sites are available in 700 locations worldwide. Take the exam close to home and on your schedule:

For more information, go to