February 29, 2020
A Guide to the Coefficient of Variation
Coefficient of Variation
The coefficient is of variation, also known simply as the CV, is a widely known measure of dispersion. In other words, this measure can help us understand how the data are spread out. The formula for the CV is different between the population and sample, which are listed in the table below.
|CV for Population||CV for Sample|
As you can see from the formula above, the CV is nothing more than a ratio between the standard deviation and the mean. This means that the CV is an expression of how precise a data set is and is useful when comparing the spread of different data sets. Keep in mind that a higher spread between the data points in a data set isn’t always an undesirable thing.
When to Use CV
The coefficient of variation is a useful tool when trying to compare the dispersion of different data sets. In other words, it can be tricky trying to understand how far the data are spread in multiple data sets using solely the standard deviation. This is because the coefficient of variation represents the proportion of the standard deviation to the mean.
For example, let’s say we have the mean, standard deviation and sample size for the following data sets.
|Measure||Set A||Set B||Set C|
Looking at the example in the table above, notice that each data sets have the same mean. However, it’s hard to make a comparison about the spread of the data because of the fact they all have vastly different sample sizes and standard deviations.
Calculating the CV, we get the following results
|Set A||Set B||Set C|
|CV||1/20 = 0.05 *100% = 5%||4/20 = 0.2 *100% = 20%||15/20 = 0.75*100% = 75%|
Now, we can analyse the CV’s and say that set A has the least spread and the highest precision. The standard deviation is only 5% of the mean, while for set B, the standard deviation is 20% of the mean and, for set C, this is 75%.
The CV is a great measure for comparison because of the fact that it puts the variability in terms of the mean.
When Not to Use CV
While the coefficient of variation is a great measure to employ in many different disciplines, from economics to psychology, there are a couple of drawbacks to using the CV. Keep in mind that the CV is a relative measure, which means that it should only be used with variables whose scale includes a nonsensical zero.
In other words, the coefficient of variation should not be employed with variables that are measured on an interval scale because interval scales cannot be computed into ratios. The most basic example of what an interval scale is would be some measurements of temperature, like Celsius or Fahrenheit.
Think about measuring temperature in Celsius where,
10 ° Fahrenheit + 10 ° F = 20 ° F
Here, the intervals between 10 and 20 degrees are the same as those between 30 and 40 - which is a difference of 10 degrees. However, 10 degrees isn’t twice as cold as 20 degrees Fahrenheit nor is 40 ° F twice as hot as 20 ° F. This is easy to understand when converted to Celsius, where it is clear that 40 ° F is not twice as hot as 20 ° F
40 ° F = 4.4 ° C
20 ° F = -6.6 ° C
In other words, the zero value in Fahrenheit and Celsius are nonsensical because they are arbitrary. Therefore, using the CV to compare relative values wouldn’t make sense for values on interval scales.
Coefficient of Variation Versus Other Measures of Variability
The CV is only one of the many measures of variability that you can use to describe a data set. While each measure strives to capture a different characteristic about the data set, it can get confusing to know which measure to use when. Here, we’ll compare the CV to other measures of variation.
As you learned in other sections of this guide on descriptive statistics, the variance of a data set measures the variability within the data set. Each value is compared to the mean in order to get an estimate of how widely spread the data is from the centre.
The variance, unlike the CV, is specific to one data set. With data sets that have different means and sample sizes, it’s impossible to compare variances. This is why variance is important when describing information within a data set.
The standard deviation, as you’ve learned in other sections, is the measure that’s most often used to make statements about variability within a data set. Being simply the square root of the variance, interpreting the SD is more practical than the variance.
As with variance, the standard deviation is most efficient when used for describing variability within a data set, rather than across data sets as we do with the CV.
While the variance measures the variability of one variable, the covariance measures the joint variability of two variables. In other words, the covariance measures how much or how little two variables vary together.
The covariance is a measure of variability used within a data set that captures how the spread of one variable relates to the spread of another. As opposed to the CV, the covariance doesn’t tell us much about the preciseness between data sets.
The standard error is similar to the standard deviation. The only difference is that, while the standard deviation is the variability measure of a population, the standard error is the term used for the “standard deviation” of a statistic calculated from a sample.
The standard error, then, is an estimate of the standard deviation of a statistic. For example, you can have a standard error of a sample mean, which will tell us the variability of our estimate. The SE has virtually no connection with the CV.
|When to Use CV||When not to Use|
|Cross-sample comparison of variability||Wanting to know the variability within a sample|
|With variables on a ratio scale (meaningful zero)||With variables on an interval scale (non-meaningful zero)|