February 29, 2020
What is Variance?
Many of us have experienced the odd feeling of walking into a new store and trying to discretely, or maybe not discretely at all, check out the price tags. The reason why we do this is simple - visually, we are trying to get an idea of what the typical price of an item will cost. In other words, we want to get a rough estimate of the mean price is for that particular store. While some stores have fairly similar prices for all their products, such as one-pound shops, there are other stores that have a wide variety of prices for their products, such as a home goods store.
This is exactly what measures of variability tell us about a data set - how far apart or close together the data points are to each other. In the case of the one-pound store in our previous example, we would expect the variation in prices to be pretty low.
The variance is one measure of variability, along with other measures such as standard deviation, coefficient of variation, interquartile range and more. The variance is defined as measuring how far spread the data points are from the mean. The larger the variance, the more far apart the data points are from the mean and vice versa. The formula for the variance can be found below.
Problem 1: Calculating Variance
The following data gives prices for the same basket of goods at different shops. You are interested in understanding how prices vary for groceries bought from different stores - interpret the data using variance.
|Store 1||Store 2||Store 3|
Problem 2: Interpreting Variance
You want to illustrate the differences between standard deviation and variance to a classroom full of students. Given the following information, how do you interpret each data set to explain these differences?
|Data Set|| |
Problem 3: Changing Units
You collect data on the mean temperatures in different cities in Wales for the month of July, as well as the variances for those temperatures You want to present your data in Fahrenheit instead of Celsius but don’t want to change your initial data. Use a shortcut to find the new mean and variance if the C to F conversion is,
Solutions to Practice Problems
Solution to Problem 1
In this problem, we were asked to interpret the data using the variance. First, we must calculate the mean and variance for each store. You can follow the steps in the table below in order to arrive at the answers.
|Store 1||Store 2||Store 3|
Here, we can see that the prices in Store 3 are the most variable because it has the highest variance. Store 1, on the other hand, is not only the cheapest on average but its prices are also relatively stable around that low mean.
Solution to Problem 2
In this problem, you were asked to interpret the differences between the data sets in order to highlight the differences between the concepts of standard deviation and variation. You could have also chosen to find the coefficient of variation, in order to further illustrate the differences within the different measures of variability.
|Data Set|| |
While the variation tells us about the variability of a data set and the standard deviation tells us how likely a value is, the coefficient of variation allows us to compare the variability between data sets. Here, the variability tells us that data set c includes the most variability, meaning that values are spread further around the mean of 50.
However, the standard deviation lets us know how likely a value is within the data set. All data sets have a relatively low standard deviation, meaning we’re likely to find values in each data set close to the mean. We’re very unlikely, for example, to see a number such as 100 or 500 in each of the data sets.
The coefficient of variation, in contrast, tells us that the standard deviation is about 8% of the mean in data set c, meaning it is the most variable out of all the data sets. We would choose the coefficient of variation over the variance if each sample had different sample sizes.
Solution to Problem 3
Using the rules for changing units, you should arrive at the following.