March 26, 2020
In the previous section on histograms and cumulative frequency polygons, we walked you through row and column frequency and how you could plot these frequencies by building dot plots, histograms and frequency polygons. In addition, we also introduced the basics of interpreting these plots. In this section, we’ll dive deeper into how to interpret histograms as well as provide you with some practice problems.
Recall that there are three main types of frequencies: absolute, row and column. In the table below, you’ll find a brief summary of each as well as a description of when they should be used.
|The amount of times a variable occurs out of the total sample size||When you want to compare variables between each other compared to the total|
|The frequency of a variable out of the row total||For comparison of one factor across the row total|
|The frequency of a variable out of the column total||For comparison of one factor across the column total|
As you can see from the table above, frequencies can be used on a number of occasions. In fact, the application of frequencies can be seen in data visualizations such as dot plots, histograms and frequency polygons. Take a look at the three examples given below illustrating the differences between the three.
|Group A||Group B||Row Total|
As you can see in the image above, absolute frequency is found by dividing every individual value by the total sample size.
From the image above, observe the row frequency is simply the individual value over the row total.
From the image above, you can see the column frequency is simply the individual value over the column total.
Interpreting Histograms and Frequency Polygons
If you recall, histograms and frequency polygons give us information about the distribution of the data set. The distribution of a data set is how the data is spread. The distribution of a data set in descriptive statistics has three main characteristics:
The first two characteristics deal with the main tools of descriptive statistics: measures of central tendency and of spread. These include measures such as the mean, standard deviation, mode and more. Skew, as we’ve discussed in previous sections, has to do with the position of the data points. In our section on absolute cumulative frequency distribution, we discussed these characteristics in depth, whose properties can be summarized in the table below.
|Centre||The centre describes the centre point of the data. If there is one, you should describe: |
|Spread||The spread of the data is how the data is distributed around the centre. You can use things like: |
|Skew||The skew of the data is when a large portion of the data is located to one side while there are a few extreme values to the other side. You can describe skew as: |
The interpretation of both boxplots and frequency polygons can be done through these three characteristics.
You want to display data on the different amounts of soda that are bought each day of the week. Given the data table below, build a frequency polygon.
You’re interested in making your data as clear and understandable as possible. Based on the data below, what do you think is the appropriate number of intervals, or “bins”, to group the data in?
Interpret the characteristics of the distribution of the following histogram. State at least one aspect of the spread, centre and skew. A data table has been provided in order to ease the interpretation.
Solution Problem 1
In this problem, you were asked to construct a frequency polygon from the data table provided. You should have come up with a chart similar to the one in the image below.
Where we can see the number of sodas bought increases throughout the week, with a slight dip in sales on Wednesday, and the highest number of sales made on Friday.
Solution Problem 2
In this problem, you were asked to decide the appropriate number of bins to display the data. This is an important part of displaying data because, when choosing too few or too many bins, important information about your data can get lost.
Here, a good width for the interval would be 8. This would give us 9 bins, which can be illustrated by the histogram below.
Solution Problem 3
For this problem, you were asked to interpret the characteristics of the distribution below.
As we can see from the table, this distribution charts the frequency of shoe sizes. First, we’ll tackle the centre of the distribution. There is one centre and it is located somewhere between 28 and 35. While we cannot say for sure what the mean, mode and median are without calculating them for the grouped data, we can say that the modal group is 28-31.
The spread is distributed unevenly around the mean, with more of the data set located to the right of the centre than to the left. While the picture suggests that the data are approximately normal, there seems to also be a slight right skew.