January 31, 2021
Measures of Central Tendency
As you can see in the image above, measures of central tendency are those that involve the centre of the data. It is important to remember that there is not one single number that can capture the centre perfectly, especially in cases of skewed distributions.
Measures of Spread
Measures of spread are distinct from measures of central tendency. While measures of central tendency seek to describe the centre, measures of spread aim to describe the distribution of the data around the centre.
As you can see, there are three common measures of spread: standard deviation, variance and range. Measures of spread are equally as important as those of the centre and, in fact, should be reported along with metrics like mean or median. This is because understanding the distribution of the data is vital to any analysis.
Standard Deviation Definition
The standard deviation is defined as the spread of the data relative to the data’s mean. Standard deviation is one of the most powerful tools in statistics, especially when it comes to normal distributions. Take a look at a standard normal distribution below.
As you can see, the mean has been standardised and is located at zero. This makes sense, as a mean is zero standard deviations from itself. Converting all other numbers in the dataset to standardized numbers, we can easily see how many standard deviations away that particular number is from the mean. In the image, as in real life, the more standard deviations a number is from the mean, the further away it is from the mean.
Standard Deviation Formula
There are two ways you can calculate the standard deviation. Let’s take a look at the standard deviation formula, which is the first way you can calculate the standard deviation.
|The ith point of the data set|
|The sample mean|
|n||The sample size|
While you can calculate it this way, you can also take a short-cut if you happen to have the variance. The standard deviation is the square root of the variance - so simply take the square of the variance if you have it.
Standard Deviation Interpretation
As you can see in the formula, we subtract the sample mean from every single value in the data set. This gives us, in raw numbers, how far each observation is from the mean. Next, these values are squared in order to get rid of the effect of negative numbers.
Think about it - say you have a mean test score of 80 and someone scores 60 points. They are 20 points below, or -20 points away from, the mean. If we added this to the difference of someone who scored 100 points, which is 20 points above the mean, they would simply cancel each other out. Instead, we square the differences in order to preserve the magnitude.
Next, we divide it by the sample size minus one, which gives us approximately the average amount of deviation from the mean, squared. Taking the square root let’s us get approximately the average spread from the mean. The higher the standard deviation, the farther spread the data points are from the mean.
You’re interested in calculating the standard deviation of the exam scores of a national standardised test to see if many people scored close to the mean or not. Use the following dataset.
In order to solve for the standard deviation, we have to follow the formula given earlier. Take a look at the solution below.
Next, we simply put it into the formula.
Let’s take the same idea as the previous example. However, this time, let’s compare it to last year’s test scores. Calculate the standard deviation and state whether the data is more or less spread than this year’s test scores.
|Last Year’s Test Taker||Score|
Next, we simply put it into the formula.
This standard deviation is lower, which means the data are less spread around the mean. The image below illustrates this.
You’re given the following information about the average salaries of people working in different fields. Calculate the standard deviation, then interpret what the standard deviation means in terms of each field.
To calculate the standard deviation, you simply need to take the square root of the variance. This gives us the following.
As we can see, while banking has the highest mean salary, there is a lot of variation between salaries. On the other hand, while education has a lower mean salary, there is not a lot of variation between salaries. Keep in mind that this is a fictitious data set.