In the other sections in this guide on descriptive statistics, we went through the fundamental concepts involved in constructing and interpreting histograms. From understanding the notion of frequencies to the best practices in visualizing data, we will review here all the statistical ideas involved in histograms.

 

What is a Histogram?

Unless you’ve been living under a rock, you have probably encountered a histogram at some point in your life. In fact, histograms are included in a special offshoot of statistics that involves displaying data in a visual manner instead of simply through numbers and tables.

While tabular and numerical data can be extremely helpful, especially in the other branch of statistics - inferential statistics - data visualization is an integral part of descriptive statistics. This is because seeing a picture of the data can often allow us to recognize patterns we may not have realized were present otherwise.

Histograms make up only a tiny portion of all the types of data visualizations available for people to build. Often, these different visualizations offer a range of advantages and disadvantages given the type of data being presented and the reason for presenting the data. Below, you’ll find a summary of the most common data visualizations.

TypeDescriptionExample
Pie ChartFor displaying the differing amounts for segments of a wholeA pie chart of the amount of different toppings sold by a pizza restaurant
Bar ChartFor displaying different quantitative values of one or more categorical variablesA bar chart showing the amount of snow on the ground for different days of the week
HistogramFor displaying different quantitative values for one or more quantitative variables, with zero or more categoriesA histogram displaying the distribution of weight across different age groups for males and females
Line GraphFor displaying how a quantitative value changes across another quantitative values, with zero or more categoriesA line graph showing how weight changes across time for females and males

How to Build a Histogram

Building a histogram is no longer a question of busting out a ruler and a pencil. In the present day, there are hundreds of programs online as well as computer software dedicated to creating data visualizations. For most of these programs, you simply need to input whatever data you want to display and, in a matter of seconds, a histogram will be built for you.

It can be helpful for the sake of interpretation, however, to understand how a histogram is built. Take the following data as an example, where the data is already grouped into intervals known as “bins” on a histogram.

TimeNumber of Passengers
6:00 - 8:00156
9:00 - 11:00607
12:00 - 14:00304
15:00 - 17:00216
18:00 - 20:00789
21:00 - 23:00142
24:00 - 2:0034

 

It’s helpful to think of the bins of a histograms as bins because they are not static. In computer programs, you can often adjust the width of the bins to include as many or as little data points as you desire.

As you can see by comparing the table above with the histogram below, the frequency of each group corresponds to the height of each bar. It is important to be mindful of the number of bins you choose for your histogram, as choosing too little or too many can result in misleading charts.

Basic histogram

Histograms tell us information about the distribution of a variable. Meaning, they summarize information about where the data points of a variable or data set are located. This can be a helpful tool when trying to analyse the spread and centre of a variable.

Histogram versus Bar Chart

Many times, people confuse histograms with bar charts - and it’s not for nothing. Bar charts and histograms have a strikingly similar appearance. Take a look at the image below and try to distinguish which chart is a histogram.

Basic histogram              bar chart basic

So, what is the difference between a histogram and a bar chart? Taking a look at the image above, you’ll see that the main giveaway is that the bars of a histogram are positioned without any space in between them. This is because, typically, the width of the bars on a histogram represent intervals.

On the other hand, the bars in a bar chart are separate from each other. In addition, the order of the bars on a bar chart doesn’t matter. You can typically rearrange the bars on the horizontal axis of a bar chart with no problems because they usually don’t have a meaningful order. Take a look at the table below, which outlines the major differences between bar charts and histograms.

HistogramBar Chart
Type of DateQuantitativeQuantitative and qualitative
VariablesAt least 2 quantitative variablesAt least 1 quantitative and 1 qualitative
BarsBars have no space between themBars are separated
IntervalBar width represent intervalsBar width has no meaning, simply aesthetic
OrderOrder of bars do matter, have to be arranged in order of intervalsOrder of bars don’t matter and can be arranged in any way

 

Histogram by Category

While many are probably used to seeing a standard histogram in math or on the news, the structure of histograms is quite flexible. Meaning, histograms don’t strictly have to display information on only one variable. In fact, histograms that display two different variables can often be used to highlight the differences between their distributions.

two category histogram

Looking at the image above, you can see how meaningful displaying information of two different categories of the same variable on the same chart can be. This is an example of how histograms can be altered to share more in-depth information about a variable’s distribution. To do this, you will typically need:

  • One quantitative variable on the vertical axis
  • One quantitative variable split by one qualitative variable (with at least two categories) on the horizontal axis

The flexibility of histograms isn’t limited to simply displaying a quantitative variable by its categories. Histograms can be combined with other charts, such as line graphs or area charts, into what is sometimes called a “combination chart” or a “combo.” Typically, this involves a secondary vertical axis, which renders information about the histogram on one vertical axis and about the other chart or graph on the other.

Measures of Central Tendency on a Histogram

As we mentioned, histograms are typically used to transmit information about a variable’s distribution. This means that the characteristics of a distribution, such as measures of variability and spread, can be viewed on a histogram. Take the picture below as an example.

Central measures histogram

Here, we know the mean and the median because it is marked on the chart. Typically, you won’t have this information readily available on a histogram and will have to calculate it separately. However, histograms are a great tool to use if you want to get an idea of the centre of the data quickly.

The mode, on the other hand, can almost always be seen from the histogram. While you would technically have to either calculate the group mode or look at the original, ungrouped data set to extract the mode - looking to a histogram can give you a quick estimate. Here, we can see that the mode appears to be in the interval 97-99.

As to the interpretation of histograms, using measures of central tendency and variability can aid in explaining your data to others. Often in statistics, data visualizations are often accompanied by measures such as the mean or standard deviation. This is an integral part of descriptive statistics because it serves to show that all statistical notions are connected.

Example

You want to display the distribution of the variable you have studied in an easy-to-comprehend manner. Because you want to express the natural patterns in the distribution of your data set, you don’t want to obscure too much of the data. Given the data table below, what are the number of bins you should use for your histogram?

AgeFrequency
19240
20140
21130
22110
2399
24108
25106
2670
2772
2899
2965
30141
3143
3221
33125
3468
3575
3674
3787
3853
3991
4045

 

From the picture below, you can see that the answer is 6 bins. While there is, of course, no right or wrong answer, you should understand that some displays are more complete than others. Image a histogram with only 2 bins - this would clearly mean grouping all the data into large intervals that would be difficult to interpret. Choosing too many, however, can get messy and hide important patterns in your data - especially when you deal with larger and larger datasets.

how many bins histogram

Did you like the article?

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5.00 out of 5)
Loading...

Danica

Located in Prague and studying to become a Statistician, I enjoy reading, writing, and exploring new places.

Did you like
this resource?

Bravo!

Download it in pdf format by simply entering your e-mail!

{{ downloadEmailSaved }}

Your email is not valid