Deep Thoughts with the Central Limit Theorem

Each week Ethan Rouen, a student in the class, will post on a topic of his interest based on class lectures. Ethan is a Ph.D. student in accounting at Columbia Business School and a columnist for Fortune.com.

A wise man once said, “Oh, people can come up with statistics to prove anything. Fourteen percent of people know that.”

That wise man was actually a cartoon. That cartoon, Homer Simpson.

Mind blown? Not yet? Well then, meet the Central Limit Theorem.

Rachel offered us a brief glimpse into the greatness of the CLT in class on Monday.

As we learned, the role of the statistician is to study a sample of a population and form some conclusions about the population based on that study. But unless you live in Magic Town, the Jimmy Stewart movie about a pollster who discovers a town that perfectly mirrors the American population (P(that movie being crappy)=87%), there will never be an exact match between what your sample tells you and what actually happens in the real world.

Statistics, Rachel said, “is the study of uncertainty.” Uncertainty can come from how we gather our data, where we gather our data, how we measure our results, and a host of other places.

The puzzle for statisticians is as follows: We know that if we took multiple samples, we’d get different answers every time. When we take one sample, how can we express our uncertainty about the estimate?

How certain is Homer that 14 percent of people know the secret to statistics? If he has a margin of error of +86%/-14%, he is dead on.

The CLT states that when you have a large enough sample, the sample mean (or the average of the observations in the sample) is a random variable that will have a Normal distribution, regardless of the distribution of the population. Put simply, it’s not a big deal that you don’t know much about your population because if you get enough observations, you can assume that your sample mean came from a nice bell curve that will allow you to not only to gather insights into the population, but also understand how accurate those insights are likely to be.

Even better, the rule of thumb on number of observations to get there is only 30!

The Central Limit Theorem allows us to estimate uncertainty about the data set by using the data set, Rachel said. It is a profound and freeing thought considering how overwhelming the size of a population can be.

If you weren’t already having trouble sleeping because you’re worrying about HW1 (it’s nice to meet you, too, SQL), thinking too much about this statement will certainly keep you awake.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: