Chapters

Measures of Variability Part 2
Average Deviation
Interquartile Range
Normal Distribution

The best Maths tutors available

Measures of Variability Part 2

In the previous sections, you learned the basics of variance and standard deviation. However, measures of variability do more than tell us information about our data set - it can also help us compare our data set to others. In this section, we will expand upon measures of variability, introducing the concepts of average deviation and normal distributions.

Average Deviation

Average deviation, or mean absolute deviation, is another measure of variability. In order to understand average deviation, let’s review how to calculate the standard deviation.

Find the mean of the data set
Subtract the mean from each observed value in the data set
Square these differences
Find the average of these squared differences
Take the square root of this average

Which can be written in the following formula:

\[
\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N}{(x_i-\mu)^2}
\]

\[
s = \sqrt{\frac{1}{N-1} \sum_{i=1}^{N}{(x_i-\bar{x})^2}
\]

Take, for example, the heights of plants found in a nursery in cm below.

Observed Value	Height
Plant #1	12
#2	6
#3	7
#4	3
#5	15
#6	10
#7	18
#8	5

Following the steps above, we can find the mean and standard deviation, rounded to the first decimal place.

Step	Result
1 Find the mean	\[ 9.5 \]
2 Subtract the mean from each value
3 Square these differences
4 Find the average of the squares in step #3	\[ 23.8 \]
5 Take the square root of step #4	\[ 4.9 \]

Finding the average deviation, while similar to finding the standard deviation, follows a shorter process. Here are the steps for calculating the average deviation

Find the mean of the data set
Subtract the mean from each observed value in the data set
Take the absolute value of these differences
Find the average of these absolute differences

These steps can be written in the following formula that can be applied to both samples and populations,

\[
Mean \; Deviation = \frac{\Sigma | x - \mu |}{N}
\]

Using the same example, we can follow these steps to get the average deviation, rounded to the first decimal place.

Step	Result
1 Find the mean	\[ 9.5 \]
2 Subtract the mean from each value
3 Take the absolute value of these differences
4 Take the sum of all these absolute values	\[ 34 \]
5 Divide the sum by the sample size	\[ \dfrac{34}{8} = 4.3 \]

While you’re less likely to hear about the mean average deviation, also known as “MAD,” it was originally proposed as a substitute for average deviation because it is argued to reflect the reality of the data set better than standard deviation. However, it is not as popular of a measure of variability as the standard deviation is.

Interquartile Range

In other sections of this guide to descriptive statistics, you have learned about the basics of percentiles, deciles and quartiles. One of the most important applications of quartiles is a concept called the interquartile range. This is yet another measure of variability.

To recap, finding the interquartile range involves, simply, finding the quartiles of a data set. To do this, you can follow these steps:

Order your data set from least to greatest
Divide your data into quarters, or fourths
Find each quartile

Below, you’ll find the heights used in our previous examples, this time rearranged into heights from least to greatest.

Observation Number	Height
1	3
2	5
3	6
4	7
5	10
6	12
7	15
8	18

Here, it is easier to visualize dividing our data into quarters.

Now that we’ve divided our data, we can now find each quartile.

Quartile	Calculation
Q1: Lower quartile	\[ \dfrac{(5+6)}{2} = 5.5 \]
Q2: Median	\[ \dfrac{(7+10}{2} = 8.5 \]
Q3: Upper Quartile	\[ \dfrac{(12+15)}{2} = 13.5 \]
Q4: Maximum	\[ 18 \]

Now that we’ve calculated each quartile, we can find the interquartile range, or IQR. The interquartile range is defined as the middle 50% of a data set. The IQR is sometimes preferred over other measures of central tendency because it shows us where the majority of the centre values lie. It is calculated as the upper quartile minus the lower quartile.

\[
Interquartile \; Range = Q_3 - Q_1
\]

In our example, this would be:

\[
IQR = 13.5 - 5.5

QR = 8 cm
\]

The interpretation of the interquartile range doesn’t really rely on the results of our calculation however. Here are three important facts the IQR tells us about the data used in this example:

25% of plants have heights below 5.5 cm
25% of plants have heights above 13.5 cm
50% of plants have heights between 8.5cm and 13.5 cm, also known as the IQR

What the actual value of the IQR tells us is that there is quite a large variation in the middle 50% of plants. Say, for example, that the IQR of the data set was 2 cm instead of 8cm - that would tell us that the plants in our data set were much more similar than the ones found in our actual data set.

Normal Distribution

Here, we’ll introduce the notion of a normal distribution, to be expanded upon in later sections. A distribution is defined as a function that illustrates a set of values within a data set and how likely they are to occur, also known as a probability distribution.

A normal distribution, also known as the “Bell Curve,” is one of the most basic distributions in statistics and is one of the fundamental concepts of inferential statistics. In order for a data set to follow a normal distribution, also called being “normal,” there are a couple of characteristics it must possess.

Let’s start with a data set, illustrated in the graph below.

Where,

\[
\bar{x} = 35
\]

\[
\sigma = 3
\]

We can also illustrate the data in terms of standard deviation, which looks like this,

This data set is normally distributed. One way we can determine this is to look at the shape of the frequency distribution. Because the data are shaped like a bell, we can guess that it follows a normal distribution because of the bell curve shape.

There are various tests that we can implement to know for sure whether the data are normal. However, a simple way to know is to see if the data follow these characteristics:

The mean, median and mode are all equal to each other
The data set is symmetric about the centre - which simply means that, drawing a line in the middle of the curve, each side is an approximate mirror image of the other

There are 50% of values below the mean and 50% above.

Did you like this article? Rate it!

4.00 (4 rating(s))

Emma

I am passionate about travelling and currently live and work in Paris. I like to spend my time reading, gardening, running, learning languages and exploring new places.

Formulas

Statistical Formulas

Descriptive Statistical Formulas

Can you help me answer my activities

Solutions to Average Deviation, Variance and Standard Deviation Problems

Measures of Variability Part 2

Average Deviation

Interquartile Range

Normal Distribution

Theory

Frequency Distribution