Many times researchers want to know the answers to big sorts of questions. These questions may or may not be profound, but they are large in their scope. What did everyone in a particular country watch on television last night? Who does an electorate intend to vote for in an upcoming election? How many birds return from migration at a certain location? What percentage of the workforce is unemployed? These kinds of questions are huge in the sense that they require us to keep track of millions of individuals.
Statistics simplifies these problems by using a technique called sampling. By conducting a statistical sample, our workload can be cut down immensely. Rather than tracking the behaviors of billions or millions, we only need to examine those of thousands or hundreds. As we will see, this simplification comes at a price.
Populations and Censuses
The population of a statistical study is what we’re trying to find out something about. It consists of all of the individuals who are being examined. A population can really be anything. Californians, caribous, computers, cars or counties could all be considered populations, depending on the statistical question. Although most populations being researched are large, they do not necessarily have to be.
One strategy to research the population is to conduct a census. In a census we examine each and every member of the population in our study. A prime example of this is the U.S. Census. Every ten years the Census Bureau sends a questionnaire to everyone in the country. Those who do not return the form are visited by census workers
Censuses are fraught with difficulties. They are typically expensive in terms of time and resources. In addition to this it’s difficult to guarantee that everyone in the population has been reached. Other populations are even more difficult to conduct a census with. If we wanted to study the habits of stray dogs in the state of New York, good luck rounding up all of those transient canines.
Since it’s normally either impossible or impractical to track down every member of a population, the next option available is to sample the population. A sample is any subset of a population, so its size can be small or large. We want a sample small enough to be manageable by our computing power, yet large enough to give us statistically significant results.
If a polling firm is trying to determine voter satisfaction with Congress, and its sample size is one, then the results are going to be meaningless (but easy to obtain). On the other hand, asking millions of people is going to consume too many resources. To strike a balance, polls of this type typically have sample sizes of around 1000.
But having the right sample size is not enough to ensure good results. We want a sample that is representative of the population. Suppose we want to find out how many books the average American reads annually. We ask 2000 college students to keep track of what they read over the year, then check back with them after a year has gone by. We find the mean number of books read is 12, and then conclude that the average American reads 12 books a year.
The problem with this scenario is with the sample. A majority of college students are between 18-25 years old, and are required by their instructors to read textbooks and novels. This is a poor representation of the average American. A good sample would contain people of different ages, from all walks of life, and from different regions of the country. To acquire such a sample we would need to compose it randomly so that every American has an equal probability of being in the sample.
Types of Samples
The gold standard of statistical experiments is the simple random sample. In such a sample of size n individuals, every member of the population has the same likelihood of being selected for the sample, and every group of n individuals has the same likelihood of being selected. There are a variety of ways to sample a population. Some of the most common are:
- Random sample
- Simple random sample
- Voluntary response sample
- Convenience sample
- Systematic sample
- Cluster sample
- Stratified sample
Some Words of Advice
As the saying goes, “Well begun is half done.” To ensure that our statistical studies and experiments have good results, we need to plan and start them carefully. It’s easy to come up with bad statistical samples. Good simple random samples require some work to obtain. If our data has been obtained haphazardly and in a cavalier manner, then no matter how sophisticated our analysis, statistical techniques will not give us any worthwhile conclusions.