“Kill the Quants?”: Why risk analysis fails
Catastrophes as diverse as Hurricane Katrina and the financial crisis point to a series of common mistakes in utilizing risk analysis and risk management.
By Douglas A. Samuelson
Should we “kill all the quants?”
The financial crisis, Hurricane Katrina and the BP oil spill have aroused many claims that quantitative risk analysis failed. Even fairly sober scientific reviews seriously criticized models in general [Scientific American, 2008] or called for a major change in their use [Lo, 2009]. Douglas W. Hubbard’s book on failures of risk management  has been a popular hit, raising the most intriguing question of why modelers don’t usually apply quantitative measures to how well their models worked. (He and this reporter followed up on his main points in an article for OR/MS Today [Hubbard and Samuelson, 2010]). Even more popular and more trenchant is Nassim Taleb’s “The Black Swan,” based on the assertion that the most important events are the “unknown unknown” ones, beyond any modeling method’s scope [Taleb, 2007]. These critiques raise major questions about the uses and limitations of risk analyses, and about the ways in which managers expect to use them.
First of all, we can surely agree that the examples cited here, among others, do indicate major deficiencies in predicting and managing risk. The BP spill left many people asking, “Why didn’t we know what could happen?” The responsible state and federal agencies lacked in-house expertise and apparently ignored a pattern of risky practice. BP underestimated the risks and under-prepared. There were some genuine modeling errors and questionable assessment of predictions the models did make.
Katrina and its aftermath showed a similar pattern. The state and federal governments repeatedly postponed repairs to the levees around New Orleans, overruling strong recommendations from their engineers. After 9/11, the federal government shifted much of its in-house expertise from natural disasters to terrorism. The analyses that were conducted focused more on the danger from a storm surge up the Mississippi River than on the danger of a “back-side” surge from Lake Pontchartrain, on the other side of the city – neglecting the critical fact that repair ships can navigate the river but could not traverse the lake, rendering the main repair plans ineffective for the actual event. The problems were greatly compounded by uncertainty about who was responsible for what.
The financial crisis featured both models that were wrong and models that were right but were ignored by managers. Modeling errors vied with deliberate misstatements. Federal agencies deferred to the expertise and presumed non-malevolence of private firms.
In all these cases and a number of others, some common elements emerge:
- unduly limiting assumptions in the analyses;
- over-reliance on inappropriate theory;
- underestimating probabilities and effects of rare events;
- excessive trust in markets;
- over-specialization and resistance among disciplines;
- insufficient empiricism about assessing quantitative methods;
- insufficient attention to availability and quality of data; and
- unclear responsibility and accountability for failures.
1. Unduly limiting assumptions. Most creditors have relied for many years on statistical credit scores, most famously the FICO score almost universally used for residential mortgages. For obvious reasons, credit scores rely entirely on borrowers’ recent experience. Hence, in usual pre-2008 times, which saw no downturns, a borrower with high utilization of available credit probably was someone who tended to spend too much and kept too little reserve. As the housing market spiraled downward, however, people who had used consumer credit and home equity lines as ready reserve for small businesses got squeezed, delaying payments to other small business who in turn got squeezed – and suddenly a large proportion of the populace looked high-risk to the scoring models.
This is one example of a general principle: When circumstances change substantially, models are liable to deteriorate not only because variables take on values different from the observations on which the model was based, but also because the relationships among variables change. For instance, a model of crowd behavior that works very well in normal situations may fail badly when, in a crisis, everyone wants to rush to a limited set of exits. In a crisis, usually uncorrelated behaviors become correlated – in the case of the financial system, urgent selling.
In the BP case, it is not clear that anyone potentially responsible for problems asked whether the geologic structure was unusual or how much difference it made that the well was 5,000 feet under water. In the planning for Katrina, evidently no one considered how people without cars would heed an order to evacuate after the buses stopped running.
2. Over-reliance on inappropriate theory. Some years ago, as a relatively junior federal policy analyst, this reporter was one of several reviewers of a study of estimating oil spill risks from tankers. The researchers who had done the study recommended having the Coast Guard require reports of all oil spills, no matter how small, for three years, to enable more precise estimation of the distribution. (As it was, spills under five gallons were exempted.) I pointed out the difference in what causes different-sized spills and asked, “How much do small spills tell you about large spills?” The lead researcher replied in a patronizing tone, “You don’t understand statistics.” Obviously he was oblivious to the assumptions implicit in treating the data as part of one data set – and prepared to follow those assumptions to any conclusion whatsoever.
The researcher in this story was a biologist, but economists are prone to the same fallacious thinking. Alan Greenspan admitted that one cause of the financial crisis was his assumption that private firms would not sacrifice their own long-term interests and those of the country for short-term profits. It would not be surprising if federal regulatory officials made similar assumptions about BP’s likelihood of sacrificing safety for short-term profit. Some litigation now in process alleges regulatory lapses, again relying on mine operators’ desire to keep their mines operating long-term, in the West Virginia coal mine collapse early last year.
3. Excessive trust in markets. A special and important case of over-reliance on theory is over-reliance on markets. William Kahn, who was director of the risk management modeling group at FNMA (Fannie Mae) in 2008, told of how the CEO believed the market price, not the model’s, until FNMA was $50 billion “underwater” in its risk pricing of residential mortgages. Economists readily concede that markets are myopic and tend not to give sufficient weight to sketchy information. Also, the assumption of approximately equal information among parties, critical to market equilibrium theory, may not hold in practice, and it is hard to pin down whether it does.
4. Underestimating probabilities and effects of rare events. It is well known that known rare events tend to be overlooked entirely and under-predicted when they are considered. Some locations have had three or four “hundred-year floods” in the past century, and some financial markets have experienced “once-in-a-generation” swoops and drops every five to 10 years or so. In models heavily reliant on data, the temptation to downplay rare events is, if anything, accentuated, as the presence of some solid data discourages speculation about the unknown. Computing confidence intervals usually relies on distributional assumptions that may not be applicable and definitely relies on having accounted for all sources of variation – which usually has not happened. Confidence intervals computed only on sampling variation without other sources of randomness overstate precision and can easily mislead even the knowledgeable about the risk from rare events.
Hubbard , in a survey of 60 firms that used some form of quantitative models, found that most experts asked to produce 90 percent confidence intervals came nowhere near getting 90 percent of the correct values within these intervals. Training and feedback can greatly improve these experts’ assessments of uncertainty – particularly when the training includes requiring the experts to bet, with real money, on their predictions. “If you won’t bet on it at 9-to-1 odds, it’s not a 90 percent interval” has a profound educational effect.
5. Insufficient empiricism about assessing quantitative methods. In his survey of users of risk analysis methods, Hubbard found that several popular techniques increase comfort far more than they improve actual results. Among the more prominent examples, he cited balanced scorecards and the analytical hierarchy process (AHP).
This is not the occasion for arguments about whether these methods were applied as designed and still produced the disappointing results he found. In any event, he urges persuasively, both producers and users of models should insist on assessing how accurate the models’ predictions were and evaluating subsequent efforts by the same modelers accordingly.
6. Insufficient attention to availability and quality of data. Good models require good data. Nevertheless, it is not at all difficult to find instances in which the data required were either unavailable or not good. One example from this reporter’s experience is hospital emergency department diagnostic codes, which tended to have little association with the patients’ eventual diagnosis and treatment.
In assessing a model, it is useful to ask how much effort was spent on data quality and whose responsibility that was. Even when good data were available for developing the model, it is also important to ascertain whether key data series are available sufficiently ahead of time to make forecasts. This reporter remembers with regret a contractor’s fine model of the price and supply of chromium that turned out to be critically dependent on the amounts purchased under a few federal contracts. These purchase figures were readily available for past years, but future such purchases were classified, as they indicated intended construction of certain high-performance military aircraft.
Modeling often demonstrates a need for more data, but additional data collection after model development is underway seems to be done rather infrequently. When it is done, modelers tend to collect more of the kinds of data they can get easily, which often means more of what they already have. Hubbard suggests that collection should be guided by the expected value of perfect information (EVPI): how much better could we do if we had this measurement precisely? Often the most useful data to collect will be a very few sketchy observations about a variable we know next to nothing about.
7. Over-specialization and resistance among disciplines. Scientific specialization has become a major and growing problem. In risk analysis, it appears to be the driving factor behind many of the other shortcomings.
In the FNMA example, for instance, the CEO trusted conventional economics over risk models. The BP crisis highlighted the long-established reluctance of economists, geologists and risk analysts to talk to each other, and the propensity to talk past each other when conversations do take place. This is not new: the Forrester Meadows World III model, the basis for the book “The Limits to Growth” that ignited a hot policy debate in the mid-1970s, omitted all price effects, so economists simply dismissed it, while geologists discussing resource limitations dismiss econometric models that assume prices dominate.
In the early days of credit scoring, in the 1960s, pioneering Fair, Isaac and Company (now FICO) frequently encountered resistance to letting models override loan officers’ judgment. Before that, they first tried offering, free of charge, to model risks of medical ailments, such as heart attacks, and got no takers at all from the doctors they approached. Most OR/MS practitioners have run into this kind of interdisciplinary resistance many times; some, regrettably, provide such resistance themselves.
8. Unclear responsibility and accountability for failures. Central to all these stories of shortcomings, as well, is lack of clarity about who is accountable for failures, whether of analysis, planning, reaction to changing events or carrying out needed actions. Both managers and analysts could have done better. Redesigning organizational structures, incentives and information handling to do better in crises is another topic for another occasion, but it is critical to future improvement.
Everyone has blind spots, from individuals to large organizations and entire professions. Over-specialization exacerbates the problem. In trying to anticipate “unknown unknowns,” breadth of vision and consideration becomes critical. One expert group [NRC, 2008] suggested that all models of complex social phenomena should be checked by scenario experts, domain experts, modelers and users to minimize the chance of something known to anyone available being overlooked by the other people involved in making the decisions. Strategic gaming [Samuelson, December 2009] is one of the most effective ways to elicit alternative assumptions and sketch their likely effects. More empiricism in evaluating models and the information on which they are based is also clearly warranted. Both analysts and managers need to challenge modeling assumptions much more diligently. In short, risk analyses remain useful, but we can all do better, and the best way forward is to help each other improve.
Douglas A. Samuelson (firstname.lastname@example.org) is president and chief scientist of InfoLogix, Inc., an R&D and consulting company in Annandale, Va. He is a frequent contributor to Analytics and OR/MS Today.
- Douglas W. Hubbard, “The Failure of Risk Management: Why It’s Broken and How to Fix It,” John Wiley & Sons, 2009.
- Douglas W. Hubbard, “How to Measure Anything: Finding the Value of Intangibles in Business,” John Wiley & Sons, 2007.
- Douglas W. Hubbard and Douglas A. Samuelson, “Modeling Without Measurements: How the Decision Analysis Culture’s Lack of Empiricism Reduces its Effectiveness,” OR/MS Today, October 2010.
- William Kahn, press conference at SAS Global Forum, 2009
- S. Lichtenstein, B. Fischhoff, and L. Phillips, “Calibration of Probabilities: The State of the Art to 1980,” in “Judgment under Uncertainty: Heuristics and Biases,” eds. D. Kahneman, P. Slovic, and A. Tversky, Cambridge University Press, Cambridge, 1982, pp. 306-334.
- Andrew Lo, “Kill All the Quants?: Models vs. Mania in the Current Financial Crisis,” NYU Stern School Conference on Volatilities and Correlations in Stressed Markets, April 3, 2009.
- Donella Meadows et. al., “The Limits to Growth,” Universe Books, New York, 1972.
- National Research Council, Behavioral Modeling and Simulation, 2008.
- Douglas A. Samuelson, “Playing for High Stakes: Wargamers and Cognitive Scientists Seek to Avoid ‘Strategic Surprise,’ ” OR/MS Today, December 2009.
- “SciAm Perspective: The Quants Did It,” Scientific American, December 2008.
- Nassim Taleb, “The Black Swan: The Impact of the Highly improbable,” Random House, 2007, 2nd Ed. 2010.
|Risk Management Videos|
|For more on risk management in this context, click on the following streaming videos filmed during presentations at the 2010 INFO RMS Annual Meeting in Austin, Texas (brief login required):