Probability and statistics are two closely related mathematical subjects. Both use much of the same terminology and there are many points of contact between the two. It is very common to see no distinction between probability concepts and statistical concepts. Many times material from both of these subjects gets lumped under the heading “probability and statistics,” with no attempt to separate what topics are from which discipline. Despite these practices and the common ground of the subjects, they are distinct. What is the difference between probability and statistics?
What Is Known
The main difference between probability and statistics has to do with knowledge. By this, we refer to what are the known facts when we approach a problem. Inherent in both probability and statistics is a population, consisting of every individual we are interested in studying, and a sample, consisting of the individuals that are selected from the population.
A problem in probability would start with us knowing everything about the composition of a population, and then would ask, “What is the likelihood that a selection, or sample, from the population has certain characteristics?”
We can see the difference between probability and statistics by thinking about a drawer of socks. Perhaps we have a drawer with 100 socks. Depending upon our knowledge of the socks, we could have either a statistics problem or a probability problem.
If we know that there are 30 red socks, 20 blue socks and 50 black socks, then we can use probability to answer questions about the makeup of a random sample of these socks. Questions of this type would be:
- “What is the probability that we draw two blue socks and two red socks from the drawer?”
- “What is the probability that we pull out 3 socks and have a matching pair?”
- ”What is the probability that we draw five socks, with replacement, and they are all black?”
If instead we have no knowledge about the types of socks in the drawer, then we enter into the realm of statistics. Statistics helps us to infer properties about the population on the basis of a random sample. Questions that are statistical in nature would be:
- A random sampling of ten socks from the drawer produced one blue sock, four red socks and five black socks. What is the total proportion of black, blue and red socks in the drawer?
- We randomly sample ten socks from the drawer, write down the number of black socks, and then return the socks to the drawer. This process is done five times. The mean number of socks is for each of these trials is 7. What is the true number of black socks in the drawer?
Of course probability and statistics do have much in common. This is because statistics is built upon the foundation of probability. Although we typically do not have complete information about a population, we can use theorems and results from probability to arrive at statistical results. These results inform us about the population.
Underlying all of this is the assumption that we are dealing with random processes. This is why we stressed that the sampling procedure we used with the sock drawer was random. If we do not have a random sample, then we are no longer building upon assumptions that are present in probability.
Probability and statistics are closely linked, but there are differences. If you need to know what methods are appropriate, just ask yourself what it is that you know.