It’s the midnight showing of the newest hit movie. People are lined up outside the theater waiting to get in. Suppose you’re asked to find the center of the line. How would you do this?
There are a couple of different ways to go about solving this problem. In the end you would have to figure out how many people were in the line, and then take half of that number. If the total number is even, then the center of the line would be between two people. If the total number is odd, then the center would be a single person.
You may ask, "What does finding the center of a line have to do with statistics?" This idea of finding the center is exactly what is used when calculating the median of a set of data.
What Is the Median?
The median is one of the three primary ways to find the average of statistical data. It is harder to calculate than the mode, but not as labor intensive as calculating the mean. It is the center in much the same way as finding the center of a line of people. After listing the data values in ascending order, the median is the data value with the same number of data values above it and below it.
Case One: An Odd Number of Values
Eleven batteries are tested to see how long they last. Their lifetimes, in hours, are given by 10, 99, 100, 103, 103, 105, 110, 111, 115, 130, 131. What is the median lifetime? Since there is an odd number of data values, this corresponds to a line with an odd number of people. The center will be the middle value.
There are eleven data values, so the sixth one is in the center. Therefore median battery life is the sixth value in this list, or 105 hours. Note that the median is one of the data values.
Case Two: An Even Number of Values
Twenty cats are weighed. Their weights, in pounds, are given by 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 9, 10, 10, 10, 11, 12, 12, 13. What is the median feline weight? Since there is an even number of data values, this corresponds to the line with an even number of people. The center is between the two middle values.
In this case the center is between the tenth and eleventh data values. To find the median we calculate the mean of these two values, and obtain (7+8)/2 = 7.5. Here the median is not one of the data values.
Any Other Cases?
The only two possibilities are to have an even or odd number of data values. So the above two examples are the only possible ways to calculate the median. Either the median will be the middle value, or the median will be the mean of the two middle values. Typically data sets are much larger than the ones that we looked at above, but the process of finding the median is the same as these two examples.
The Effect of Outliers
The mean and mode are highly sensitive to outliers. What this means is that the presence of an outlier will dramatically affect both of these measures of the center. One advantage of the median is that it is not influenced as much by an outlier.
To see this, consider the data set 3, 4, 5, 5, 6. The mean is (3+4+5+5+6)/5 = 4.6, and the median is 5. Now keep the same data set, but add the value 100: 3, 4, 5, 5, 6, 100. Clearly 100 is an outlier, as it is much greater than all of the other values. The mean of the new set is now (3+4+5+5+6+100)/6 = 20.5. However, the median of the new set is 5. Although the
Application of the Median
Due to what we have seen above, the median is the preferred measure of average when the data contains outliers. When incomes are reported, a typical approach is to report the median income. This is done because the mean income is skewed by a small number of people with very high incomes (think Bill Gates and Oprah).