If you spend much time at all dealing with statistics, pretty soon you run into the phrase “probability distribution.” It is here that we really get to see how much the areas of probability and statistics overlap. Although this may sound like something technical, the phrase probability distribution is really just a way to talk about organizing a list of probabilities. A probability distribution is a function or rule that assigns probabilities to each value of a random variable. The distribution may in some cases be listed. In other cases it is presented as a graph.
Suppose that we roll two dice and then record the sum of the dice. Sums anywhere from two to 12 are possible. Each sum has a particular probability of occurring. We can simply list these as follows:
- The sum of 2 has a probability of 1/36
- The sum of 3 has a probability of 2/36
- The sum of 4 has a probability of 3/36
- The sum of 5 has a probability of 4/36
- The sum of 6 has a probability of 5/36
- The sum of 7 has a probability of 6/36
- The sum of 8 has a probability of 5/36
- The sum of 9 has a probability of 4/36
- The sum of 10 has a probability of 3/36
- The sum of 11 has a probability of 2/36
- The sum of 12 has a probability of 1/36
Graph of a Probability Distribution
A probability distribution can be graphed, and sometimes this helps to show us features of the distribution that were not apparent from just reading the list of probabilities. The random variable is plotted along the x-axis, and the corresponding probability is plotted along the y - axis.
- For a discrete random variable, we will have a histogram
- For a continuous random variable, we will have the inside of a smooth curve
The rules of probability are still in effect, and they manifest themselves in a few ways. Since probabilities are greater than or equal to zero, the graph of a probability distribution must have y-coordinates that are nonnegative. Another feature of probabilities, namely that one is the maximum that the probability of an event can be, shows up in another way.
Area = Probability
The graph of a probability distribution is constructed in such a way that areas represent probabilities. For a discrete probability distribution, we are really just calculating the areas of rectangles. In the graph above, the areas of the three bars corresponding to four, five and six correspond to the probability that the sum of our dice is four, five or six. The areas of all of the bars add up to a total of one.
In the standard normal distribution, or bell curve, we have a similar situation. The area under the curve between two z values corresponds to the probability that our variable falls between those two values. For example, the area under the bell curve for -1 < z < 1 accounts for approximately 68% of the total area. The area here is much more complicated than a rectangle. That is why calculus and other advanced mathematics is necessary in order to use most continuous probability distributions.
A List of Probability Distributions
There are literally infinitely many probability distributions. A list of some of the more important distributions follows:
- Binomial Distribution – this gives the number of successes for a series of independent experiments with two outcomes
- Chi-Square Distribution – this is for use of determining how close observed quantities fit a proposed model
- F-Distribution – this is a distribution that is used in analysis of variance (ANOVA)
- Normal Distribution – this is called the bell curve and is found throughout statistics.
- Student’s t Distribution – this is for use with small sample sizes from a normal distribution