I wanted to kick off the course blog by talking about two different Wired articles written 8 months apart. They present divergent perspectives about understanding and trusting models.
The first article (The End of Theory by Chris Anderson) takes the position that large data triumphs over everything. It talks about how petabyte-scale data and applied statistics help Google target ads, translate documents, and trash spam. The article states that at this scale data is impossible to visualize and impossible to understand. As long as the products seems to do the right thing, we shouldn’t worry about understanding the data or the models.
The argument is summed up well near the end of the article, “Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.” Oh, by the way, The End of Theory was written on 6/23/2008, months before Lehman Brothers filed for bankruptcy.
The second article (Recipe for Disaster by Felix Salmon) was written exactly eight months later. A product of the financial flux after the stock market crash, the article tries to understand why Wall Street models failed. Specifically, it focuses on the role of one function, the Gaussian copula function. The article has a few take home messages. Here are three that stick out:
- When you don’t understand how a model works, you can’t predict how it will fail.
- When a model is making you billions of dollars, you tend not to ask questions.
- When there is a poor communication channel between people to handle the numbers and people who make decisions, you don’t have a foundation for asking questions.
The second to last paragraph talks about the disconnect between the numbers people and the decision people:
In the world of finance, too many quants see only the numbers before them and forget about the concrete reality the figures are supposed to represent. They think they can model just a few years’ worth of data and come up with probabilities for things that may happen only once every 10,000 years. Then people invest on the basis of those probabilities, without stopping to wonder whether the numbers make any sense at all.
In hindsight everything is clear.
These articles can not be divorced from the time in which they were written. They straddle the start of largest economic downturn since the Great Depression. And in some ways they are a reflection of that time. The first article embodies hope and faith in data and statistics, and the second recaps the effect of misplaced hope and blind, unquestioning faith.
However, even now there is a belief that numbers speak from themselves. And to be fair, some data products, such as the Google products mentioned by Chris Anderson, may not need close scrutiny. But not all products are as safe as a spam filter, and algorithms cannot distinguish between a piece of spam and a subprime mortgage. To know what the numbers are saying, you need to know how they were generated and modeled. Poorly understood models can be dangerous. A lack of understanding can bring down global financial systems. A lack of understanding has consequences.