How to Calculate the Standard Deviation for Grouped Data
Often, people may choose to group their data in order to display or analyse their data set more efficiently. For example, taking a quantitative variable like age and transforming it into categorical age groups or taking a quantitative variable like points on a test and transforming them into categorical grade groups.
This might be easier to visualize, so let’s take the second example. In the table below are test scores from a classroom.
|Test Score Categories||Frequency |
|10 - 20||1|
|20 - 30||8|
|30 - 40||5|
|40 - 50||9|
|50 - 60||8|
|60 - 70||10|
|70 - 80||75|
We want to know the standard deviation, however, we don’t have information on the individual test scores of each of the 115 students. In order to calculate group standard deviation, we have modify the standard deviation formula just a bit. While the formula for grouped standard deviation is different for the population and sample, we’ll focus only on sample grouped SD here. The formula looks like:
While this might look complicated, it is quite simple. Below, you’ll find the interpretation of each step in the formula.
|1.Find the midpoint values of the groups|
|2.Multiply these midpoint values by the frequency|
|3.Add these multiplied values and divide by the sample size to get the grouped mean|
|4.Subtract the grouped mean from the midpoint values|
|5. Square these subtracted values|
|6. Multiply the frequency by these squared values|
|7. Add all the multiplied values|
|8. Divide these summed values by the sample size minus 1|
|9. Take the square root of the value from step 8|
The process to find the grouped SD is similar to that of a simple standard deviation. Following these steps, we’re able to find the grouped standard deviation from the earlier example.
|Test Score Categories||Frequency |
|10 - 20||1||(20-10)/2 = 15||15*1 = 15||15-64.7 = -49.7||(-49.7)^2 = 2474.2||1*2474.2 = 2474.2|
|20 - 30||8||25||200||-39.7||1579.4||12635.0|
|30 - 40||5||35||175||-29.7||884.5||4422.7|
|40 - 50||9||45||405||-19.7||389.7||3507.5|
|50 - 60||8||55||440||-9.7||94.9||759.2|
|60 - 70||10||65||650||0.3||0.1||0.7|
|70 - 80||75||75||5625||10.3||105.2||7892.9|
Characteristics of Standard Deviation
The standard deviation will, naturally, be different for all data sets. However, there are a couple of characteristics of the SD that will always hold true.
- The standard deviation is always either positive or zero. This is because all values that we sum are squared values.
- The standard deviation is zero if all values in the data set are equal to the mean.
- The standard deviation is sensitive to outliers or extreme values.
Interpreting the Standard Deviation
Recall the rules of thumb from earlier, where the higher the standard deviation, the higher the variability. While the words “higher” variability might imply that it is something we don’t desire, a higher variability in the data set isn’t always a “bad” thing.
Interpreting the standard deviation can be difficult because of the fact that it depends on the context of the question someone is trying to solve. It’s helpful to stay away from thinking of a high or low standard deviation as having a “bad” or “good” standard deviation.
Instead, remember the definition and calculation of the SD, which is a measure of how widely spread the data is around the mean. Let’s say, for example, two restaurants want to measure which dish is each customer's favourite through a 100 point scale where 0 is the lowest and 100 is the highest.
|Restaurant A||Restaurant B|
Restaurant A has a lower standard deviation than restaurant B, which means that people rated each dish in a manner that was more consistent to the mean. However, people rated Restaurant A’s dishes quite low, with an average of 21 points. While Restaurant B’s ratings have a higher SD, their mean rating is a lot higher than Restaurant A.
It is important to always take the standard deviation in the context of things like the average, the question you want to ask, or the people in your sample.
Problem 1: Grouped Standard Deviation
You’re studying the amount of time students spend using social media. You find a study that was performed in a classroom of students that displays data on how many hours per week each student spent on one or more social media platforms. Given the data below, find and interpret the grouped standard deviation, rounding to the nearest tenth.
|Number of Hours||Number of Students|
|0 - 2||5|
Solution to Problem 1
In this problem you were asked to
- Calculate the group standard deviation
- Interpret the standard deviation
While we’re not directly given the frequency, we know that the number of students in each category is the frequency. From previous examples, we also know that the total number of students is our sample size. Using this information, we calculate the following.
Where we calculate the following: