Key Takeaways
- The interquartile range (IQR) rule helps find outliers in a data set.
- You find the interquartile range by subtracting the first quartile from the third quartile.
- The IQR rule considers numbers outliers if they are far from most data using quartile calculations.
The interquartile range (IQR) rule is useful in detecting the presence of outliers. Outliers are individual values that fall outside of the overall pattern of a data set. The definition of an outlier is somewhat vague and subjective, so it is helpful to have a rule to apply when determining whether a data point is truly an outlier—this is where the interquartile range rule comes in.
What Is the Interquartile Range?
Any set of data can be described by its five-number summary. These five numbers, which give you the information you need to find patterns and outliers, consist of (in ascending order):
- The minimum or lowest value of the dataset
- The first quartile Q1, which represents a quarter of the way through the list of all data
- The median of the data set, which represents the midpoint of the whole list of data
- The third quartile Q3, which represents three-quarters of the way through the list of all data
- The maximum or highest value of the data set.
The five-number summary gives a person more information about their data than looking at the numbers all at once, or at least it makes it much easier to understand the data.
For example, the range, which is the minimum subtracted from the maximum, is one indicator of how spread out the data is in a set (note: the range is highly sensitive to outliers—if an outlier is also a minimum or maximum, the range will not be an accurate representation of the breadth of a data set). The range would be difficult to extrapolate otherwise.
The interquartile range is similar to the range but less sensitive to outliers. You calculate the interquartile range in much the same way as the range. All you need to do is subtract the first quartile from the third quartile:
IQR = Q3 – Q1.
The interquartile range shows how the data is spread about the median. It is less susceptible than the range to outliers and can, therefore, be more helpful.
Using the Interquartile Rule to Find Outliers
Though it is not often affected by outliers, the interquartile range can be used to detect them. You would do this through the following steps:
- Calculate the interquartile range for the data.
- Multiply the interquartile range (IQR) by 1.5 (a constant used to discern outliers).
- Add 1.5 x (IQR) to the third quartile. Any number greater than this is a suspected outlier.
- Subtract 1.5 x (IQR) from the first quartile. Any number less than this is a suspected outlier.
Remember that the interquartile rule is only a rule of thumb that generally holds but does not apply to every case. In general, you should always follow up your outlier analysis by studying the resulting outliers to see if they make sense. You should examine any potential outlier obtained by the interquartile method in the context of the entire set of data.
Interquartile Rule Example Problem
See the interquartile range rule at work with the following example.
Suppose you have the following set of data: 1, 3, 4, 6, 7, 7, 8, 8, 10, 12, 17. The five-number summary for this data set is minimum = 1, first quartile = 4, median = 7, third quartile = 10, and maximum = 17. You may look at the data and automatically say that 17 is an outlier, but what does the interquartile range rule say?
If you were to calculate the interquartile range for this data, you would find it to be:
Q3 – Q1 = 10 – 4 = 6
Now multiply your answer by 1.5 to get 1.5 x 6 = 9. Nine less than the first quartile is 4 – 9 = -5. No data is less than this. Nine more than the third quartile is 10 + 9 =19. No data is greater than this. Despite the maximum value being five more than the nearest data point, the interquartile range rule shows that it should probably not be considered an outlier for this data set.