Some distributions of data, such as the bell curve are symmetric. This means that the right and the left are perfect mirror images of one another. But not every distribution of data is symmetric. Sets of data that are not symmetric are said to be asymmetric. The measure of how asymmetric a distribution can be is called skewness. As we will see, data can be skewed either to the right or to the left.
The mean, median and mode are all measures of the center of a set of data. The skewness of the data can be determined by how these quantities are related to one another.
Skewed to the Right
Data that are skewed to the right have a long tail that extends to the right. An alternate way of talking about a data set skewed to the right is to say that it is positively skewed. In this situation the mean and the median are both greater than the mode. As a general rule, most of the time for data skewed to the right, the mean will be greater than the median. In summary, for a data set skewed to the right:
- Always: mode < mean
- Always: mode < median
- Most of the time: mode < median < mean
Skewed to the Left
The situation reverses itself when we deal with data skewed to the left. Data that are skewed to the left have a long tail that extends to the left. An alternate way of talking about a data set skewed to the left is to say that it is negatively skewed. In this situation the mean and the median are both less than the mode. As a general rule, most of the time for data skewed to the left, the mean will be less than the median. In summary, for a data set skewed to the left:
- Always: mean < mode
- Always: median < mode
- Most of the time: mean < median < mode
Measures of Skewness
It’s one thing to look at two set of data and determine that one is symmetric while the other is asymmetric. It’s another to look at two sets of asymmetric data and say that one is more skewed than the other. It can be very subjective to determine which is more skewed by simply looking at the graph of the distribution. This is why there are ways to numerically calculate the measure of skewness.
One measure of skewness, called Pearson’s first coefficient of skewness, is to subtract the mean from the mode, and then divide this difference by the standard deviation of the data. The reason for dividing the difference is so that we have a dimensionless quantity. This explains why data skewed to the right has positive skewness. If the data set is skewed to the right, the mean is greater than the mode, and so subtracting the mode from the mean gives a positive number. A similar argument explains why data skewed to the left has negative skewness.
Pearson’s second coefficient of skewness is also used to measure the asymmetry of a data set. For this quantity we subtract the mode from the median, multiply this number by three and then divide by the standard deviation.
Applications of Skewed Data
Skewed data arises quite naturally in various situations. Incomes are skewed to the right because even just a few individuals who earn millions of dollars can greatly affect the mean, and there are no negative incomes. Similarly data involving the lifetime of a product, such as a brand of light bulb, are skewed to the right. Here the smallest that a lifetime can be is zero, and long lasting light bulbs will impart a positive skewness to the data.