Each week Ethan Rouen, a student in the class, will post on a topic of his interest based on class lectures. Ethan is a Ph.D. student in accounting at Columbia Business School and a columnist for Fortune.com.
A wise man once said, “Oh, people can come up with statistics to prove anything. Fourteen percent of people know that.”
That wise man was actually a cartoon. That cartoon, Homer Simpson.
Mind blown? Not yet? Well then, meet the Central Limit Theorem.
Rachel offered us a brief glimpse into the greatness of the CLT in class on Monday.
As we learned, the role of the statistician is to study a sample of a population and form some conclusions about the population based on that study. But unless you live in Magic Town, the Jimmy Stewart movie about a pollster who discovers a town that perfectly mirrors the American population (P(that movie being crappy)=87%), there will never be an exact match between what your sample tells you and what actually happens in the real world.
Statistics, Rachel said, “is the study of uncertainty.” Uncertainty can come from how we gather our data, where we gather our data, how we measure our results, and a host of other places.
The puzzle for statisticians is as follows: We know that if we took multiple samples, we’d get different answers every time. When we take one sample, how can we express our uncertainty about the estimate?
How certain is Homer that 14 percent of people know the secret to statistics? If he has a margin of error of +86%/-14%, he is dead on.
The CLT states that when you have a large enough sample, the sample mean (or the average of the observations in the sample) is a random variable that will have a Normal distribution, regardless of the distribution of the population. Put simply, it’s not a big deal that you don’t know much about your population because if you get enough observations, you can assume that your sample mean came from a nice bell curve that will allow you to not only to gather insights into the population, but also understand how accurate those insights are likely to be.
Even better, the rule of thumb on number of observations to get there is only 30!
The Central Limit Theorem allows us to estimate uncertainty about the data set by using the data set, Rachel said. It is a profound and freeing thought considering how overwhelming the size of a population can be.
If you weren’t already having trouble sleeping because you’re worrying about HW1 (it’s nice to meet you, too, SQL), thinking too much about this statement will certainly keep you awake.