Measures of Central Tendency

When we’re talking about probability or statistics, we should always keep in mind that there are two ways to describe the distribution of a dataset. The distribution of a variable is the centre of the data and how far or close each observation is located around that centre. Naturally, the two measures you can attain from a variable are those of central tendency and spread.

Central Tendency Spread
Definition Captures the centre The location of the observations around the centre
Metrics Mean

Median

Mode

Standard Deviation

Variation

Range

Uses Used mainly in descriptive statistics Can be used for both descriptive and inferential statistics

The table above illustrates some of the main differences between the two types of measures. However, keep in mind that they are usually reported together in order to have the context.

descriptive_measures

 

Mode Definition

The mode is defined as the value of the given variable that occurs the most. The number of times a value occurs is called the frequency of a variable. Let’s take a look at an example, where we’re interested in the number of visitors to a national park during the week.

Day People per Group
Monday 1
Tuesday 3
Wednesday 5
Thursday 2
Friday 1
Saturday 3
Saturday 3
Saturday 4
Sunday 3

Here, we can look at two different modes, since we have two different variables. We can use it to answer questions like

  • Which day do people usually visit the park?
  • How big of a group usually visits the park?

mode_example

 

How to Find the Mode

While the mode has a formula, the easiest way to find the mode is to simply calculate the frequency for the variable of interest and then see which value has the highest frequency. Most programs will do this for you automatically with some simple command or formula. Let’s take both variables above as examples.

Day of the Week Frequency
Monday 1
Tuesday 1
Wednesday 1
Thursday 1
Friday 1
Saturday 3
Sunday 1

It is clear which day of the week most people visit the national park: Saturday. Next, you can calculate the frequency of the size of the groups that visit.

Group Size Frequency
1 2
3 4
5 1
2 1
4 1

The most common size for groups is 3 people.

When to Use the Mode

The mode is quite a powerful tool when it comes to understanding the most frequent values in your data. The table below shows some examples of when the mode could be more useful for capturing the centre than other measures of central tendency.

Situation Example
The most frequent values The most frequent users of a website
The distribution is very highly skewed Taking the most frequent income in a highly skewed income distribution
For qualitative data Can calculate frequencies for a categorical variable

Problem 1

Let’s revisit the example from earlier, where we looked at the number of visitors at the national park during the week. That was a small data set, which made it easy to calculate the mode by hand. In the image below, you can see the number of visitors is significantly more.

mode_example2

Identify the mode and state an advantage of the mode over other measures of central tendency.

Solution 1

In this question, we were asked to:

  • Identify the mode
  • State one advantage of the mode

From the image, we can see that the day with the most visitors is Saturday. One of the main advantages of the mode is that it can often be identified by a visual, just like the chart above. While the mean requires a formula to calculate it and the median requires a program to calculate it most of the time, the mode can be calculated more easily.

Problem 2

Take the following two datasets, which contain information on the income of 10 people. Determine which measure of central tendency would be best for each dataset. Make sure to justify your answer.

Dataset A Dataset B
1 3000 200
2 1000 100
3 4000 100
4 2500 250
5 8000 100
6 6500 150
7 5000 100
8 5500 300
9 7500 8000
10 7000 7500

Solution 2

In this question, we were asked to state which measure of central tendency would better capture the centre of the data. In order to answer this, we should first take a look at the distribution visually.

even_distribution uneven_dsitribution

As we can see, these distributions are shaped completely different. While the first income distribution is somewhat of a gradual rise, the second income distribution has many of the same values with two extreme values on the higher end.

Using this information, we can ascertain that the mean or median is appropriate for dataset A, while the mode is best for dataset B.

Dataset A Dataset B
Mean 5000 1680
Mode N/A 100

As you can see, the mode only functions if there actually is a most frequently occurring value. Since none of the incomes are the same in dataset A, we can’t use the mode. Looking at the mean for dataset B, we can see that it does not give an accurate picture of the centre.

This is because the dataset is highly skewed, which causes the extreme values to inflate the actual centre value. The mode, on the other hand, better reflects where the centre is located.

 
Need a Maths teacher?

Did you like the article?

1 Star2 Stars3 Stars4 Stars5 Stars 3.00/5 - 2 vote(s)
Loading...

Danica

Located in Prague and studying to become a Statistician, I enjoy reading, writing, and exploring new places.