February 29, 2020
Chapters
A Guide to Variance
In previous sections of this guide on descriptive statistics, you learned the fundamentals of variance. Specifically, we taught you what variance is, it’s important role in statistics and how to calculate it. Here, we’ll give a brief overview of all these things, as well as compare variance to other measures of variability.
What is Variance
Like all measures of variability, variance strives to capture the dispersion of a variable. Many people fall into the trap of associating variability with undesirability simply because, in the real world, variability is something we try to fix.
After all, it wouldn’t be too pleasant if your coffee would taste wildly different from the norm every time you bought it. While we’ll go over the specifics of interpretation later in this overview, it’s important to understand variance as a simple measure of dispersion as you dive into learning how to use it.
A basic definition of variance is that it captures how far spread data points are from their mean. A bigger number, or a high variance, suggests that the data are spread further from the centre point of the data set. This might be easier to grasp through an example. In the table below, you’ll find data on two different bags of marbles bought from a toy store.
Marble Colour  Bag 1  Bag 2 
Blue  5  6 
Red  7  4 
Orange  8  15 
Yellow  6  1 
The variance of each bag is as follows. Don’t worry about the calculation, which we’ll show you in the next section. Here, focus on understanding what the variance is and why it is important in statistics.
Bag 1  Bag 2  
Variance  1.7  36.6 
Here, the bigger variance means that there is a greater spread amongst the average number of marbles in the bag per colour. This can easily be seen by looking at the data set and noticing that the first bag has a more uniform amount of marbles per colour than the second one.
As you can imagine, variance is a concept with a wide range of application in statistics and beyond.
How to Calculate Variance
Now that we’ve shown you what variance is, we’ll now guide you through how to calculate it. As you know from our lessons on populations and samples, measures are calculated differently for parameters and statistics. As a brief recap, parameters are calculated from the population while statistics are calculated from samples.
In the table below, you’ll find the formulas for variance for the population and for a sample.
Population  Sample  
Variance Notation 


Variance Formula 


Mean 


As you can see from the formula, you must first calculate the mean and subtract the mean from each individual observation in the data. Next, you sum those values and then divide them by the sample size minus one. This can sound arbitrary, so we’ll take the example above and break down how to calculate the variance step by step.
Calculation for Bag 1  Calculation for Bag 1  
1. Calculate the mean 


2. Subtract the mean from every observation 


3. Square each subtracted value 


4. Sum all squared values 


5. Divide the sum by the minus 1 


As we can see from the previous section, we’ve arrived at the same answer for each variance. While it’s unlikely you’ll have to calculate variance by hand with the number of computer software out there to complete the job for you, it’s helpful to understand the process behind the formula.
Effect of Changing Units
Like all measures of central tendency and variability, the issue of changing the units of the data can come up. Changing units simply means transforming the data points in your data set by performing common operations such as subtraction, addition, multiplication and division. More advanced transformations involve taking the logarithm or power of each data point.
There are many reasons why someone may want to transform their data. Some reasons include:
 Transforming the data to fit a more convenient distribution
 Wanting to display the data in more understandable units
 Needing to change the data to convert it into a new variable
In the table below, you’ll find an example of each of the aforementioned scenarios.
Reason  Example 
Transforming the data to fit a more convenient distribution  Changing the data to fit a normal distribution 
Wanting to display the data in more understandable units  Receiving or measuring data in imperial units and needing units using the metric system 
Needing to change the data to convert it into a new variable  Multiplying height and weight data to form a new variable of body mass index (BMI) 
If you perform basic operations, such as addition, subtraction, multiplication and division, there are a couple of shortcuts to keep in mind if you merely want to know the measures of central tendency and variance. For addition and subtraction, the rules are recorded in the table below.
When adding or subtracting a constant  Effect on the Measure 
Mean, Median, Mode  Add or subtract that constant 
Standard Deviation, Variance, Average Deviation, IQR  No effect, they stay the same 
For multiplication and division, these changes can be found in the table below.
When multiplying or dividing by a constant  Effect on the Measure 
Mean, Median, Mode, Standard Deviation, Average Deviation, IQR  Multiply or divide by that constant 
Variance  Multiply or divide by the square of that constant 
Notice that when adding and subtracting a constant, measures of variability don’t change. On the other hand, when multiplying or dividing by a constant, the measures of variability do change, with the variance changing in a unique way. Let’s take the example used in the previous section and add 3 to every data point.
Marble Colour  Bag 1 
Blue  5+3 = 8 
Red  7+3 = 10 
Orange  8+3 = 11 
Yellow  6+3 = 9 
To calculate the variance, we would normally follow the process of finding the mean, summing all the squared differences and dividing by the sample size minus 1.
As you can see, the variance didn’t change from the previous example. Instead of calculating each new observation by adding three and then calculating the new variance, using the rules we know we could have just stated that the variance had no change.
Using the rules above, we can measure the variance of the same data set if each data point were multiplied or divided by 3. Instead of performing these operations and calculating the variance again, we simply do the following.
Multiply by 3  Divide by 3  
Multiply or divide by the square of that constant 


As you can imagine, remembering these rules can save you a lot of time. If you’re sceptical, calculate the new variances by hand and compare your answer to the ones above.
How to Interpret Variance
Interpreting the variance is all about context. While we might be tempted to generalize and say that big variances mean bigger spread, this rule only makes sense when we take a look at our data set.
For example, the number 1 000 might seem like a high number for a variance  however, if the mean is in the millions, it doesn’t seem so abnormal anymore. In this case, our variance would be pretty small and indicate that the values are spread closely around the mean.
Covariance
Covariance, unlike variance, tells us the joint variability of a pair of variables. In other words, it compares how one spread compares to another. This will give us a hint as to how variables are related and change with another.
For example, if we were to take the covariance of weight and height, we would most likely find that the variables change in the same direction with a high degree of relation. Meaning both that:
 The higher the height, the higher the weight
 Height has a strong relationship with weight
You can find the formulas for covariance below
Sample  Population  
Covariance 


Variance Versus Other Measures of Variability
Variance is most related to standard deviation. Each measure tells us something about how the data are spread but with slight differences, which are summarized below.
Variance  Standard Deviation  
Goal  Describes the variability of observations within a data set  Describes the spread around the centre point of the data set 
Units  Squared units of the data set  The same units as the data set 
Interpretation  How far spread the data units are from the mean  Tells us how typical values are given the mean 