The central limit theorem is a result from probability theory that shows up in a number of places in the field of statistics. Although the central limit theorem can seem abstract and devoid of any application, this theorem is actually quite important to the practice of statistics. What is the importance of the central limit theorem? It all has to do with the distribution of our population. As we will see, this theorem allows us to make some assumptions about this population.
Statement of the Theorem
The statement of the central limit theorem can seem quite technical, but can be understood if we think through the following steps. We begin with a simple random sample of size n from a population of interest. From this sample we can easily form a sample mean that corresponds to the mean of what measurement we are curious about in our population. A sampling distribution for the sample mean is produced by repeatedly selecting simple random samples from the same population and of the same size, and then computing the sample mean for each of these samples. These samples are to be thought of as being independent of one another.
The central limit theorem concerns the sampling distribution of the sample means. We may ask about the overall shape of the sampling distribution. The central limit theorem says that this sampling distribution is approximately normal - commonly known as a bell curve. This approximation improves as we increase the size of the simple random samples that are used to produce the sampling distribution.
There is a very surprising feature concerning the central limit theorem. The astonishing fact is that this theorem says that a normal distribution arises regardless of the initial distribution. Even if our population has a skewed distribution, which occurs when we examine things such as incomes or people’s weights, a sampling distribution for a sample with a sufficiently large sample size will be normal.
Central Limit Theorem in Practice
The unexpected appearance of a normal distribution from a population distribution that is skewed (even quite heavily skewed) has some very important applications in statistical practice. Many practices in statistics, such as those involving hypothesis testing or confidence intervals, make some assumptions concerning the population that the data was obtained from. One assumption that is initially made in a statistics course is that the populations that we work with are normally distributed.
The assumption that data is from a normal distribution simplifies matters, but seems a little unrealistic. Just a little work with some real-world data shows that outliers, skewness, multiple peaks and asymmetry show up quite routinely. We can get around the problem of data from a population that is not normal. The use of an appropriate sample size and the central limit theorem help us to get around the problem of data from populations that are not normal.
Thus, even though we might not know the shape of the distribution where our data comes from, the central limit theorem says that we can treat the sampling distribution as if it were normal. Of course, in order for the conclusions of the theorem to hold, we do need a sample size that is large enough. Exploratory data analysis can help us to determine how large of a sample is necessary for a given situation.