March 26, 2020
In this section, you will find the definitions for variance and standard deviation, as well as their formulas. In addition, we’ll also provide you with practice problems so you can either perfect or refresh your knowledge of these powerful measures of variability. In other sections of this guide on descriptive statistics, you will find how these measures are applied to at a more advanced level, such as how to interpret each measure and what in what situations you should use variance over the standard deviation.
Measures of Variability
As we’ve discussed before, descriptive statistics can be broken down into two distinct measures: those of central tendency and those of variability. While measures of central tendency attempt to capture the centre of the data, measures of variability seek to identify the level of variation in the data, otherwise known as the spread. The spread of the data is an easy concept to remember, as the act of spreading goes hand-in-hand with its counterpart in statistics.
The spread of the data, quite simply, lets us know whether the data points are huddled close together, if they are spread far apart, whether there are some points cantered around one single value or more, etc. From the images below, you can get a better idea of what this looks like.
As you can see, the data points in the first image are located close to each. Because all the values are located somewhere between 5 and 25, there’s not much variation in the values of the data points. The second image, however, illustrates data points with quite a large spread. This time, the values range from about 5 to 95, where the variation is higher because the data points take on a variety of values.
The third image is another example of a high degree of variability. In this scenario, however, the majority of the values are located around 10. This is the main reason why measures of central tendency are often reported with measures of variability because they provide a more complete picture of how the data are spread around what central values either in place of visualizations, like the ones above, but more often in addition to visualizations.
Recall that statistics also employs different formulas and practices when it comes to samples and populations. While a population contains all the elements we want to study, such as all the schools in a country, a sample contains a portion of those elements, such as one hundred schools in a country. Because we rarely ever get a chance to measure entire populations, the “true” measures are rarely known and are called parameters. On the other hand, because measures calculated from a sample aren’t the true population measure, but rather estimations of those true figures, they are called statistics.
What is Variance?
The variance of a variable or data set is defined as the spread of their data points. This is similar to what we discussed previously because the variance can be thought of as the level of variation in a data set. The formula for the sample and population variance can be seen in the table below.
|Sample Variance||Population Variance|
We’ll break down step by step what this means. Because the variation is an attempt to measure the variability of the data set, it compares every data point to the mean of all the data points, then divided by the sample size minus 1 to get an average of sorts.
|1. First, you compute the mean so that you can have a basis for which to compare all values.|| |
|2. Next, you calculate the difference between each data point and the mean. Think about it - if that difference is big for the majority of the data, this means a lot of points are located far from the centre and vice versa|| |
|3. This difference is squared to deal with negative values. If we were to leave these negative values, the sum of the difference would be underestimated. For example, say you have a mean of 50 and a data point of 4, which would give a differenced value of -46. What matters here is the magnitude of that difference, not whether it’s positive or negative. However, if we keep the negative sign, it artificially lowers the sum. Squaring it is an easy way of dealing with negative numbers in general.|| |
|4. The sum of these squared differences are taken. This reflects the total magnitude of the differences away from the mean.|| |
|5. Dividing it by the sample size is a natural way to estimate the amount of variance per data point, similar to finding the mean. However, we divide by - in order to arrive at an unbiased sample variance (the operation of subtracting 1 is known as Bessel’s Correction)|| |
Following these steps, we use the data table below as an example to find the variance.
|1||45||45 - 41.2 =3.8||14.4|
|2||32||32 - 41.2 =-9.2||84.6|
|3||29||29 - 41.2 =-12.2||148.8|
|4||56||56 - 41.2 =14.8||219|
|5||44||44 - 41.2 =2.8||7.8|
Where the mean is calculated as,
What is Standard Deviation?
As you may have noticed, while understanding the variance may be easy, interpreting it can get tricky. While it’s helpful to think of the variation as each data points average distance from the mean, you should remember that not only are you dividing the sum of differences by n - 1 and not just n, but also that the sum of differences is in squared units. Meaning, the variance is in squared units as well. Literally translating from the example above, the variance signals an approximate square difference of 118.7 per data point.
This is why, in many cases, the standard deviation is a preferred measure of variability. Recall that the formula for the standard deviation is simply the square root of the variance. This is so that the units go from squared units to the original units of the data. The formula for standard deviation can be found below.
|Sample SD||Population SD|
Find the standard deviation of the following data set.
Solution Problem 1
In order to find the standard deviation, you must first find the mean.
The next step is to subtract the mean from all values in the data set, square those subtracted values, and then add them all together. After that, simply plug it into the standard deviation formula.
Given the following information, find the variance.
|Sample Size||3 000|
Solution Problem 2
Recall that the standard deviation is simply the square root of the variance.
So, to find the variance, simply take the square of the variance.
Interpret the standard deviation and variance of the following table.
Solution Problem 3
While the variance tells us about the spread, the standard deviation tells us how typical a value is given the mean. While the variance is quite high, it is not high relative to the mean of 10,000 - meaning the data has a very low variability.