February 18, 2020
Measures of Variability Part 2
In the previous sections, you learned the basics of variance and standard deviation. However, measures of variability do more than tell us information about our data set  it can also help us compare our data set to others. In this section, we will expand upon measures of variability, introducing the concepts of average deviation and normal distributions.
Average Deviation
Average deviation, or mean absolute deviation, is another measure of variability. In order to understand average deviation, let’s review how to calculate the standard deviation.
 Find the mean of the data set
 Subtract the mean from each observed value in the data set
 Square these differences
 Find the average of these squared differences
 Take the square root of this average
Which can be written in the following formula:
Take, for example, the heights of plants found in a nursery in cm below.
Observed Value  Height 
Plant #1  12 
#2  6 
#3  7 
#4  3 
#5  15 
#6  10 
#7  18 
#8  5 
Following the steps above, we can find the mean and standard deviation, rounded to the first decimal place.
Step  Result 
1 Find the mean 

2 Subtract the mean from each value  
3 Square these differences  
4 Find the average of the squares in step #3 

5 Take the square root of step #4 

Finding the average deviation, while similar to finding the standard deviation, follows a shorter process. Here are the steps for calculating the average deviation
 Find the mean of the data set
 Subtract the mean from each observed value in the data set
 Take the absolute value of these differences
 Find the average of these absolute differences
These steps can be written in the following formula that can be applied to both samples and populations,
Using the same example, we can follow these steps to get the average deviation, rounded to the first decimal place.
Step  Result 
1 Find the mean 

2 Subtract the mean from each value  
3 Take the absolute value of these differences  
4 Take the sum of all these absolute values 

5 Divide the sum by the sample size 

While you’re less likely to hear about the mean average deviation, also known as “MAD,” it was originally proposed as a substitute for average deviation because it is argued to reflect the reality of the data set better than standard deviation. However, it is not as popular of a measure of variability as the standard deviation is.
Interquartile Range
In other sections of this guide to descriptive statistics, you have learned about the basics of percentiles, deciles and quartiles. One of the most important applications of quartiles is a concept called the interquartile range. This is yet another measure of variability.
To recap, finding the interquartile range involves, simply, finding the quartiles of a data set. To do this, you can follow these steps:
 Order your data set from least to greatest
 Divide your data into quarters, or fourths
 Find each quartile
Below, you’ll find the heights used in our previous examples, this time rearranged into heights from least to greatest.
Observation Number  Height 
1  3 
2  5 
3  6 
4  7 
5  10 
6  12 
7  15 
8  18 
Here, it is easier to visualize dividing our data into quarters.
Now that we’ve divided our data, we can now find each quartile.
Quartile  Calculation 
Q1: Lower quartile 

Q2: Median 

Q3: Upper Quartile 

Q4: Maximum 

Now that we’ve calculated each quartile, we can find the interquartile range, or IQR. The interquartile range is defined as the middle 50% of a data set. The IQR is sometimes preferred over other measures of central tendency because it shows us where the majority of the centre values lie. It is calculated as the upper quartile minus the lower quartile.
In our example, this would be:
The interpretation of the interquartile range doesn’t really rely on the results of our calculation however. Here are three important facts the IQR tells us about the data used in this example:
 25% of plants have heights below 5.5 cm
 25% of plants have heights above 13.5 cm
 50% of plants have heights between 8.5cm and 13.5 cm, also known as the IQR
What the actual value of the IQR tells us is that there is quite a large variation in the middle 50% of plants. Say, for example, that the IQR of the data set was 2 cm instead of 8cm  that would tell us that the plants in our data set were much more similar than the ones found in our actual data set.
Normal Distribution
Here, we’ll introduce the notion of a normal distribution, to be expanded upon in later sections. A distribution is defined as a function that illustrates a set of values within a data set and how likely they are to occur, also known as a probability distribution.
A normal distribution, also known as the “Bell Curve,” is one of the most basic distributions in statistics and is one of the fundamental concepts of inferential statistics. In order for a data set to follow a normal distribution, also called being “normal,” there are a couple of characteristics it must possess.
Let’s start with a data set, illustrated in the graph below.
Where,
We can also illustrate the data in terms of standard deviation, which looks like this,
This data set is normally distributed. One way we can determine this is to look at the shape of the frequency distribution. Because the data are shaped like a bell, we can guess that it follows a normal distribution because of the bell curve shape.
There are various tests that we can implement to know for sure whether the data are normal. However, a simple way to know is to see if the data follow these characteristics:
 The mean, median and mode are all equal to each other
 The data set is symmetric about the centre  which simply means that, drawing a line in the middle of the curve, each side is an approximate mirror image of the other
There are 50% of values below the mean and 50% above.