February 29, 2020
A Guide to Variance
In previous sections of this guide on descriptive statistics, you learned the fundamentals of variance. Specifically, we taught you what variance is, it’s important role in statistics and how to calculate it. Here, we’ll give a brief overview of all these things, as well as compare variance to other measures of variability.
What is Variance
Like all measures of variability, variance strives to capture the dispersion of a variable. Many people fall into the trap of associating variability with undesirability simply because, in the real world, variability is something we try to fix.
After all, it wouldn’t be too pleasant if your coffee would taste wildly different from the norm every time you bought it. While we’ll go over the specifics of interpretation later in this overview, it’s important to understand variance as a simple measure of dispersion as you dive into learning how to use it.
A basic definition of variance is that it captures how far spread data points are from their mean. A bigger number, or a high variance, suggests that the data are spread further from the centre point of the data set. This might be easier to grasp through an example. In the table below, you’ll find data on two different bags of marbles bought from a toy store.
|Marble Colour||Bag 1||Bag 2|
The variance of each bag is as follows. Don’t worry about the calculation, which we’ll show you in the next section. Here, focus on understanding what the variance is and why it is important in statistics.
|Bag 1||Bag 2|
Here, the bigger variance means that there is a greater spread amongst the average number of marbles in the bag per colour. This can easily be seen by looking at the data set and noticing that the first bag has a more uniform amount of marbles per colour than the second one.
As you can imagine, variance is a concept with a wide range of application in statistics and beyond.
How to Calculate Variance
Now that we’ve shown you what variance is, we’ll now guide you through how to calculate it. As you know from our lessons on populations and samples, measures are calculated differently for parameters and statistics. As a brief recap, parameters are calculated from the population while statistics are calculated from samples.
In the table below, you’ll find the formulas for variance for the population and for a sample.
|Variance Notation|| |
|Variance Formula|| |
As you can see from the formula, you must first calculate the mean and subtract the mean from each individual observation in the data. Next, you sum those values and then divide them by the sample size minus one. This can sound arbitrary, so we’ll take the example above and break down how to calculate the variance step by step.
|Calculation for Bag 1||Calculation for Bag 1|
|1. Calculate the mean|| |
|2. Subtract the mean from every observation|| |
|3. Square each subtracted value|| |
|4. Sum all squared values|| |
|5. Divide the sum by the minus 1|| |
As we can see from the previous section, we’ve arrived at the same answer for each variance. While it’s unlikely you’ll have to calculate variance by hand with the number of computer software out there to complete the job for you, it’s helpful to understand the process behind the formula.
Effect of Changing Units
Like all measures of central tendency and variability, the issue of changing the units of the data can come up. Changing units simply means transforming the data points in your data set by performing common operations such as subtraction, addition, multiplication and division. More advanced transformations involve taking the logarithm or power of each data point.
There are many reasons why someone may want to transform their data. Some reasons include:
- Transforming the data to fit a more convenient distribution
- Wanting to display the data in more understandable units
- Needing to change the data to convert it into a new variable
In the table below, you’ll find an example of each of the aforementioned scenarios.
|Transforming the data to fit a more convenient distribution||Changing the data to fit a normal distribution|
|Wanting to display the data in more understandable units||Receiving or measuring data in imperial units and needing units using the metric system|
|Needing to change the data to convert it into a new variable||Multiplying height and weight data to form a new variable of body mass index (BMI)|
If you perform basic operations, such as addition, subtraction, multiplication and division, there are a couple of shortcuts to keep in mind if you merely want to know the measures of central tendency and variance. For addition and subtraction, the rules are recorded in the table below.
|When adding or subtracting a constant||Effect on the Measure|
|Mean, Median, Mode||Add or subtract that constant|
|Standard Deviation, Variance, Average Deviation, IQR||No effect, they stay the same|
For multiplication and division, these changes can be found in the table below.
|When multiplying or dividing by a constant||Effect on the Measure|
|Mean, Median, Mode, Standard Deviation, Average Deviation, IQR||Multiply or divide by that constant|
|Variance||Multiply or divide by the square of that constant|
Notice that when adding and subtracting a constant, measures of variability don’t change. On the other hand, when multiplying or dividing by a constant, the measures of variability do change, with the variance changing in a unique way. Let’s take the example used in the previous section and add 3 to every data point.
|Marble Colour||Bag 1|
|Blue||5+3 = 8|
|Red||7+3 = 10|
|Orange||8+3 = 11|
|Yellow||6+3 = 9|
To calculate the variance, we would normally follow the process of finding the mean, summing all the squared differences and dividing by the sample size minus 1.
As you can see, the variance didn’t change from the previous example. Instead of calculating each new observation by adding three and then calculating the new variance, using the rules we know we could have just stated that the variance had no change.
Using the rules above, we can measure the variance of the same data set if each data point were multiplied or divided by 3. Instead of performing these operations and calculating the variance again, we simply do the following.
|Multiply by 3||Divide by 3|
|Multiply or divide by the square of that constant|| |
As you can imagine, remembering these rules can save you a lot of time. If you’re sceptical, calculate the new variances by hand and compare your answer to the ones above.
How to Interpret Variance
Interpreting the variance is all about context. While we might be tempted to generalize and say that big variances mean bigger spread, this rule only makes sense when we take a look at our data set.
For example, the number 1 000 might seem like a high number for a variance - however, if the mean is in the millions, it doesn’t seem so abnormal anymore. In this case, our variance would be pretty small and indicate that the values are spread closely around the mean.
Covariance, unlike variance, tells us the joint variability of a pair of variables. In other words, it compares how one spread compares to another. This will give us a hint as to how variables are related and change with another.
For example, if we were to take the covariance of weight and height, we would most likely find that the variables change in the same direction with a high degree of relation. Meaning both that:
- The higher the height, the higher the weight
- Height has a strong relationship with weight
You can find the formulas for covariance below
Variance Versus Other Measures of Variability
Variance is most related to standard deviation. Each measure tells us something about how the data are spread but with slight differences, which are summarized below.
|Goal||Describes the variability of observations within a data set||Describes the spread around the centre point of the data set|
|Units||Squared units of the data set||The same units as the data set|
|Interpretation||How far spread the data units are from the mean||Tells us how typical values are given the mean|