In previous sections of this guide, we walked you through the foundations of statistics and provided you with practice problems to put your newfound knowledge to the test. Use this as your go-to formula sheet for all things descriptive statistics.

Measures of Central Tendency

 

Superprof

Mode

One of the three most basic measures of central tendency, the mode is defined as the most frequently occurring value. The mode is easy to remember because of its similarity to the word “most.”

While there is no formula for mode, it is helpful to understand how the absolute, relative and cumulative frequency are calculated. This is because frequency, or the “count,” is the number of times a variable occurs in a sample or population.

The mode should be used as a representative for the central value when you want to know what the most frequently occurring value of a variable is. As an example, take the frequency for each colour.

ColourAbsolute FrequencyRelative Frequency
Blue405405/850 = 48%
Yellow299299/850 = 35%
Green146146/850 = 17%
Total850100%

 

Both frequencies tell us that blue is the mode.

 

Median

The median is another basic measure of central tendency. The definition of the median is the middle point of an ordered data set, ordered meaning sorted from the least to the greatest value. Like the mode, there is no general formula for the median.

In order to find the median, you must follow the steps outlined in the table below.

StepDescription
1Take a variable and order each value from least to greatest
2.For odd numbers, take the one middle value. This is the median
3.For even numbers, find the two middle values.
3.a.Find the average of these two middle values. This is the median

 

For odd numbers, this looks like,

 

    \[ 1, 2, \bold{3}, 4, 5 \]

 

    \[ Median = 3 \]

 

And for even numbers,

 

    \[ 1, 2, \bold{3}, \bold{4}, 5, 6 \]

 

    \[ \dfrac{3+4}{2} = 3.5 \]

 

Types of Means

There are three basic means, known as Pythagorean Means: arithmetic, geometric and harmonic. The first two are used most commonly in statistics.

Arithmetic

The arithmetic mean is a simple average. The arithmetic mean, or AM, should be used with numbers that have an additive relationship.

Geometric

The geometric mean is a multiplicative average. Also called the GM, it should be used with numbers that have a multiplicative or exponential relationship.

Harmonic

The harmonic mean is an average of reciprocals. The HM is most appropriate when you’d like to find an average of rates.

In the table below you’ll find the formulas for these basic means.

Mean

Formula

Arithmetic

    \[ \bar{x} = \frac{\Sigma x_{i}}{n} \]

Geometric

    \[ x_{GM} = \sqrt[n]{a_{1}*a_{2}*\dotsm*a_{n}} \]

Harmonic

    \[ x_{HM} = \frac{n}{\Sigma_{i=1}^{n}{\dfrac{1}{x_{i}}}} \]

 

Weighted and Grouped Means

There are two types of advanced means that can be utilized for data analysis: the weighted arithmetic mean and the grouped mean. Weighted arithmetic means are means for which each data point contributes unequally to the final average.

The grouped mean, on the other hand, calculates the mean for a variable divided into groups. Below, you'll find a formula for each advanced mean.

Weighted Mean

Grouped Mean

    \[ \bar{x}_{weighted} = \frac{\Sigma x_{i} _w{i}}{\Sigma w_{i}} \]

    \[ \bar{x}_{grouped} =  \frac{\Sigma x_{m} }{n} \]

    \[ x_{i} = ith \medspace observation \]

    \[ x_{m} = midpoint \medspace of \medspace the \medspace group \]

    \[ w_{i} = ith \medspace weight \]

    \[ n = sample\medspace  size \]

 

Measures of Variability

 

Variance

The variance is one of the many measures of variability. Variability is exactly what it sounds like, which is the amount of variation there is in a particular variable.

The variance of a variable is how variable it is around the centre. Meaning, the variance tells us how far spread each data point is spread around the sample mean. The variance, because it is in squared units of a variable, isn’t preferred less than the standard deviation.

Covariance

The covariance is another measure of variability. However, whereas the variance tells us information about one variable, the covariance tells us the joint variability of two variables.

This simply means that the covariance strives to show the relationship between the spread of two variables. The covariance is similar to the correlation coefficient, where the correlation coefficient is frequently referred to as a “scaled” covariance.

Standard Deviation

The standard deviation is another measure of variability. It is often confused with variance because of their similarities, both in terms of calculation and interpretation. The standard deviation also gives us information about how far spread data points are around a mean.

However, the major difference between the SD and the variance is that the SD tells us how typical a value is given the sample mean. Below, you’ll find the formulas for the variance, covariance and standard deviation for both samples and populations.

Sample

Population

Variance

    \[ \sigma^2 = \frac{\Sigma(x_{i}-\bar{x})^2}{n-1} \]

    \[ s^2 = \frac{\Sigma(X-\mu)^2}{N} \]

Covariance

    \[ Cov(X,Y) = \]

    \[ \frac{\Sigma(x_{i}-\bar{x})(y_{i}-\bar{y})}{n-1} \]

    \[ Cov(X,Y) = \]

    \[ \frac{\Sigma(x_{i}-\mu_{x})(y_{i}-\mu_{y})}{N} \]

Standard Deviation

    \[ s =  \sqrt{ \frac{\Sigma(x_{i}-\bar{x})^2}{n-1} } \]

    \[ \sigma = \sqrt{  \frac{\Sigma(x_{i}-\mu)^2}{n} } \]

 

Notice that the standard deviation is the square root of the variance. Using the table below, we can find each measure of variability.

ObservationValue
145
267
338
454
552

 

VariabilityStandard Deviation

    \[ \bar{x} = 51.2 \]

    \[ \sigma = \sqrt{\sigma^2} \]

    \[ \sigma^2 = \frac{\Sigma(x_{i}-\bar{x})^2}{n-1} = \]

 

    \[ \dfrac{470.8}{5-1} = 117.7 \]

    \[ \sigma = \sqrt{117.7} \]

 

    \[ \sigma = 10.8 \]

 

Average Deviation

The average deviation, also known as the mean absolute deviation, is another measure of variability. While it has been argued that the mean absolute deviation, or MAD, is a better reflection of the variability in a data set than the standard deviation, it is less popular.

The formula for the MAD is written as the following,

 

    \[ Mean \; Deviation = \frac{\Sigma | x - \mu |}{N} \]

 

This formula can be generalized for both populations and samples. It is, essentially, the sum of the absolute value of the distances between each data point and the mean, divided by the sample or population size. Using the formula above, we can find the MAD for the following data points.

ObservationValue
178
256
363
468
572

 

Steps

Calculation

1. Find the mean \bar{x}

    \[ \dfrac{(78+56+63+68+72)}{5} = 67.4 \]

2. Subtract the mean from each data point and find its absolute value

    \[ | x_{i} - \bar{x} | \]

3. Add all these values

    \[ 10.6+11.4+4.4+0.6+4.6 = 31.6 \]

4. Divide sum by sample size

    \[ \dfrac{31.6}{5} = 6.32 \]

 

Coefficient of Variation

The coefficient of variation, or CV, is a measure of variability that tells us about the variability between data sets instead of within them. The formula for CV can be found in the table below.

CV for the PopulationCV for the Sample

    \[ CV = \frac{\sigma}{\mu} *100\% \]

    \[ CV = \frac{s}{\bar{x}} *100\% \]

 

Using the data set below as an example, we can find and interpret the CV.

Data SetABCD

    \[ \bar{x} \]

5947

    \[ s \]

0.20.10.60.8

 

Data SetABCD
CV

    \[ \dfrac{0.2}{5} \]

    \[ * 100\% \]

 

    \[ 4\% \]

    \[ \dfrac{0.1}{9} \]

    \[ * 100\% \]

 

    \[ 1\% \]

    \[ \dfrac{0.6}{4} \]

    \[ * 100\% \]

 

    \[ 15\% \]

    \[ \dfrac{0.8}{7} \]

    \[ * 100\% \]

 

    \[ 11\% \]

 

From the results above, we can see that data set C has the highest variability between the four data sets because its s makes up a higher proportion of its mean. Data set B has the lowest variability.

 

Correlation Coefficient

The correlation coefficient is similar to the covariance, except that it measures the joint variability of two variables. Meaning, the relationship between the spreads of two variables. Take a look at the formulas below.

 

CovarianceCorrelation Coefficient NotationCorrelation Coefficient
Population

    \[ \sigma_{xy} = \]

    \[ \frac{\Sigma(x_{i}-\mu_{x})(y_{i}-\mu_{y})}{N} \]

    \[ \rho_{xy} \]

    \[ \frac{\sigma_{xy}}{\sigma_{x}\sigma_{y}} \]

Sample

    \[ S_{xy} = \]

    \[ \frac{\Sigma(x_{i}-\bar{x})(y_{i}-\bar{y})}{n-1} \]

    \[ r_{xy} \]

    \[ \frac{S_{xy}}{S_{x}\S_{y}} \]

 

Notice that the correlation coefficient is simply the covariance divided by the standard deviations of both variables.

Did you like the article?

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5.00 out of 5)
Loading...

Danica

Located in Prague and studying to become a Statistician, I enjoy reading, writing, and exploring new places.

Did you like
this resource?

Bravo!

Download it in pdf format by simply entering your e-mail!

{{ downloadEmailSaved }}

Your email is not valid

Leave a Reply

avatar
  Subscribe  
Notify of