A Guide to Variance

In previous sections of this guide on descriptive statistics, you learned the fundamentals of variance. Specifically, we taught you what variance is, it’s important role in statistics and how to calculate it. Here, we’ll give a brief overview of all these things, as well as compare variance to other measures of variability.

 

What is Variance

Like all measures of variability, variance strives to capture the dispersion of a variable. Many people fall into the trap of associating variability with undesirability simply because, in the real world, variability is something we try to fix.

After all, it wouldn’t be too pleasant if your coffee would taste wildly different from the norm every time you bought it. While we’ll go over the specifics of interpretation later in this overview, it’s important to understand variance as a simple measure of dispersion as you dive into learning how to use it.

A basic definition of variance is that it captures how far spread data points are from their mean. A bigger number, or a high variance, suggests that the data are spread further from the centre point of the data set. This might be easier to grasp through an example. In the table below, you’ll find data on two different bags of marbles bought from a toy store.

 

Marble ColourBag 1Bag 2
Blue56
Red74
Orange815
Yellow61

 

The variance of each bag is as follows. Don’t worry about the calculation, which we’ll show you in the next section. Here, focus on understanding what the variance is and why it is important in statistics.

 

Bag 1Bag 2
Variance1.736.6

 

Here, the bigger variance means that there is a greater spread amongst the average number of marbles in the bag per colour. This can easily be seen by looking at the data set and noticing that the first bag has a more uniform amount of marbles per colour than the second one.

As you can imagine, variance is a concept with a wide range of application in statistics and beyond.

 

Superprof

How to Calculate Variance

Now that we’ve shown you what variance is, we’ll now guide you through how to calculate it. As you know from our lessons on populations and samples, measures are calculated differently for parameters and statistics. As a brief recap, parameters are calculated from the population while statistics are calculated from samples.

In the table below, you’ll find the formulas for variance for the population and for a sample.

 

PopulationSample
Variance Notation

    \[ \sigma^2 \]

    \[ s^2 \]

Variance Formula

    \[ \frac{\Sigma(X-\mu)^2}{N} \]

    \[ \frac{\Sigma(x_{i}-\bar{x})^2}{n-1} \]

Mean

    \[ \mu = \frac{\Sigma(x_{i})}{N} \]

    \[ \bar{x} = \frac{\Sigma(x_{i})}{n} \]

 

As you can see from the formula, you must first calculate the mean and subtract the mean from each individual observation in the data. Next, you sum those values and then divide them by the sample size minus one. This can sound arbitrary, so we’ll take the example above and break down how to calculate the variance step by step.

 

Calculation for Bag 1Calculation for Bag 1
1. Calculate the mean

    \[ \bar{x} = \]

 

    \[ \dfrac{(5+7+8+6)}{4} \]

 

    \[ \bar{x} = \dfrac{(26)}{4} \]

 

    \[ 6.5 \]

    \[ \bar{x} = \]

 

    \[ \dfrac{(6+4+15+1)}{4} \]

 

    \[ \bar{x} = \dfrac{(26)}{4} \]

 

    \[ 6.5 \]

2. Subtract the mean from every observation

    \[ x_{i}-\bar{x} \]

    \[ x_{i}-\bar{x} \]

3. Square each subtracted value

    \[ (x_{i}-\bar{x})^2 \]

    \[ (x_{i}-\bar{x})^2 \]

4. Sum all squared values

    \[ \Sigma(x_{i}-\bar{x})^2 = 5 \]

    \[ \Sigma(x_{i}-\bar{x})^2 = 109 \]

5. Divide the sum by the n minus 1

    \[ \dfrac{5}{4-1} = 1.7 \]

    \[ \dfrac{109}{4-1} = 36.3 \]

 

As we can see from the previous section, we’ve arrived at the same answer for each variance. While it’s unlikely you’ll have to calculate variance by hand with the number of computer software out there to complete the job for you, it’s helpful to understand the process behind the formula.

 

Effect of Changing Units

Like all measures of central tendency and variability, the issue of changing the units of the data can come up. Changing units simply means transforming the data points in your data set by performing common operations such as subtraction, addition, multiplication and division. More advanced transformations involve taking the logarithm or power of each data point.

There are many reasons why someone may want to transform their data. Some reasons include:

  • Transforming the data to fit a more convenient distribution
  • Wanting to display the data in more understandable units
  • Needing to change the data to convert it into a new variable

In the table below, you’ll find an example of each of the aforementioned scenarios.

 

ReasonExample
Transforming the data to fit a more convenient distributionChanging the data to fit a normal distribution
Wanting to display the data in more understandable unitsReceiving or measuring data in imperial units and needing units using the metric system
Needing to change the data to convert it into a new variableMultiplying height and weight data to form a new variable of body mass index (BMI)

 

If you perform basic operations, such as addition, subtraction, multiplication and division, there are a couple of shortcuts to keep in mind if you merely want to know the measures of central tendency and variance. For addition and subtraction, the rules are recorded in the table below.

 

When adding or subtracting a constant Effect on the Measure
Mean, Median, ModeAdd or subtract that constant
Standard Deviation, Variance, Average Deviation, IQRNo effect, they stay the same

 

For multiplication and division, these changes can be found in the table below.

 

When multiplying or dividing by a constantEffect on the Measure
Mean, Median, Mode, Standard Deviation, Average Deviation, IQRMultiply or divide by that constant
VarianceMultiply or divide by the square of that constant

 

Notice that when adding and subtracting a constant, measures of variability don’t change. On the other hand, when multiplying or dividing by a constant, the measures of variability do change, with the variance changing in a unique way. Let’s take the example used in the previous section and add 3 to every data point.

 

Marble ColourBag 1
Blue5+3 = 8
Red7+3 = 10
Orange8+3 = 11
Yellow6+3 = 9

 

To calculate the variance, we would normally follow the process of finding the mean, summing all the squared differences and dividing by the sample size minus 1.

 

    \[ \bar{x} = \dfrac{(8+10+11+9)}{4} \]

 

    \[ \bar{x} = \dfrac{(38)}{4} \]

 

    \[ 9.5 \]

 

    \[ \Sigma(x_{i}-\bar{x})^2 = 5 \]

 

    \[ \dfrac{5}{4-1} = 1.7 \]

 

As you can see, the variance didn’t change from the previous example. Instead of calculating each new observation by adding three and then calculating the new variance, using the rules we know we could have just stated that the variance had no change.

Using the rules above, we can measure the variance of the same data set if each data point were multiplied or divided by 3. Instead of performing these operations and calculating the variance again, we simply do the following.

 

Multiply by 3Divide by 3
Multiply or divide by the square of that constant

Old variance * (3^2)

 

    \[ 1.7 * (3^2) = 15 \]

\dfrac{Old variance}{(3^2)}

 

    \[ \dfrac{1.7}{(3^2)} = 0.19 \]

 

As you can imagine, remembering these rules can save you a lot of time. If you’re sceptical, calculate the new variances by hand and compare your answer to the ones above.

 

How to Interpret Variance

Interpreting the variance is all about context. While we might be tempted to generalize and say that big variances mean bigger spread, this rule only makes sense when we take a look at our data set.

For example, the number 1 000 might seem like a high number for a variance - however, if the mean is in the millions, it doesn’t seem so abnormal anymore. In this case, our variance would be pretty small and indicate that the values are spread closely around the mean.

 

Covariance

Covariance, unlike variance, tells us the joint variability of a pair of variables. In other words, it compares how one spread compares to another. This will give us a hint as to how variables are related and change with another.

For example, if we were to take the covariance of weight and height, we would most likely find that the variables change in the same direction with a high degree of relation. Meaning both that:

  • The higher the height, the higher the weight
  • Height has a strong relationship with weight

You can find the formulas for covariance below

SamplePopulation
Covariance

    \[ Cov(X,Y) = \]

 

    \[ \frac{\Sigma(x_{i}-\bar{x})(y_{i}-\bar{y})}{n-1} \]

    \[ Cov(X,Y) = \]

 

    \[ \frac{\Sigma(x_{i}-\mu_{x})(y_{i}-\mu_{y})}{N} \]

 

Variance Versus Other Measures of Variability

Variance is most related to standard deviation. Each measure tells us something about how the data are spread but with slight differences, which are summarized below.

VarianceStandard Deviation
GoalDescribes the variability of observations within a data setDescribes the spread around the centre point of the data set
UnitsSquared units of the data setThe same units as the data set
InterpretationHow far spread the data units are from the meanTells us how typical values are given the mean

 

Did you like the article?

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5.00 out of 5)
Loading...

Danica

Located in Prague and studying to become a Statistician, I enjoy reading, writing, and exploring new places.

Did you like
this resource?

Bravo!

Download it in pdf format by simply entering your e-mail!

{{ downloadEmailSaved }}

Your email is not valid

Leave a Reply

avatar
  Subscribe  
Notify of