February 29, 2020
In previous sections of this guide, we walked you through the foundations of statistics and provided you with practice problems to put your newfound knowledge to the test. Use this as your go-to formula sheet for all things descriptive statistics.
Measures of Central Tendency
One of the three most basic measures of central tendency, the mode is defined as the most frequently occurring value. The mode is easy to remember because of its similarity to the word “most.”
While there is no formula for mode, it is helpful to understand how the absolute, relative and cumulative frequency are calculated. This is because frequency, or the “count,” is the number of times a variable occurs in a sample or population.
The mode should be used as a representative for the central value when you want to know what the most frequently occurring value of a variable is. As an example, take the frequency for each colour.
|Colour||Absolute Frequency||Relative Frequency|
|Blue||405||405/850 = 48%|
|Yellow||299||299/850 = 35%|
|Green||146||146/850 = 17%|
Both frequencies tell us that blue is the mode.
The median is another basic measure of central tendency. The definition of the median is the middle point of an ordered data set, ordered meaning sorted from the least to the greatest value. Like the mode, there is no general formula for the median.
In order to find the median, you must follow the steps outlined in the table below.
|1||Take a variable and order each value from least to greatest|
|2.||For odd numbers, take the one middle value. This is the median|
|3.||For even numbers, find the two middle values.|
|3.a.||Find the average of these two middle values. This is the median|
For odd numbers, this looks like,
And for even numbers,
Types of Means
There are three basic means, known as Pythagorean Means: arithmetic, geometric and harmonic. The first two are used most commonly in statistics.
The arithmetic mean is a simple average. The arithmetic mean, or AM, should be used with numbers that have an additive relationship.
The geometric mean is a multiplicative average. Also called the GM, it should be used with numbers that have a multiplicative or exponential relationship.
The harmonic mean is an average of reciprocals. The HM is most appropriate when you’d like to find an average of rates.
In the table below you’ll find the formulas for these basic means.
Weighted and Grouped Means
There are two types of advanced means that can be utilized for data analysis: the weighted arithmetic mean and the grouped mean. Weighted arithmetic means are means for which each data point contributes unequally to the final average.
The grouped mean, on the other hand, calculates the mean for a variable divided into groups. Below, you'll find a formula for each advanced mean.
Measures of Variability
The variance is one of the many measures of variability. Variability is exactly what it sounds like, which is the amount of variation there is in a particular variable.
The variance of a variable is how variable it is around the centre. Meaning, the variance tells us how far spread each data point is spread around the sample mean. The variance, because it is in squared units of a variable, isn’t preferred less than the standard deviation.
The covariance is another measure of variability. However, whereas the variance tells us information about one variable, the covariance tells us the joint variability of two variables.
This simply means that the covariance strives to show the relationship between the spread of two variables. The covariance is similar to the correlation coefficient, where the correlation coefficient is frequently referred to as a “scaled” covariance.
The standard deviation is another measure of variability. It is often confused with variance because of their similarities, both in terms of calculation and interpretation. The standard deviation also gives us information about how far spread data points are around a mean.
However, the major difference between the SD and the variance is that the SD tells us how typical a value is given the sample mean. Below, you’ll find the formulas for the variance, covariance and standard deviation for both samples and populations.
|Standard Deviation|| |
Notice that the standard deviation is the square root of the variance. Using the table below, we can find each measure of variability.
The average deviation, also known as the mean absolute deviation, is another measure of variability. While it has been argued that the mean absolute deviation, or MAD, is a better reflection of the variability in a data set than the standard deviation, it is less popular.
The formula for the MAD is written as the following,
This formula can be generalized for both populations and samples. It is, essentially, the sum of the absolute value of the distances between each data point and the mean, divided by the sample or population size. Using the formula above, we can find the MAD for the following data points.
|1. Find the mean|| |
|2. Subtract the mean from each data point and find its absolute value|| |
|3. Add all these values|| |
|4. Divide sum by sample size|| |
Coefficient of Variation
The coefficient of variation, or CV, is a measure of variability that tells us about the variability between data sets instead of within them. The formula for CV can be found in the table below.
|CV for the Population||CV for the Sample|
Using the data set below as an example, we can find and interpret the CV.
From the results above, we can see that data set C has the highest variability between the four data sets because its makes up a higher proportion of its mean. Data set B has the lowest variability.
The correlation coefficient is similar to the covariance, except that it measures the joint variability of two variables. Meaning, the relationship between the spreads of two variables. Take a look at the formulas below.
|Covariance||Correlation Coefficient Notation||Correlation Coefficient|
Notice that the correlation coefficient is simply the covariance divided by the standard deviations of both variables.