In other sections of this guide on descriptive statistics, we taught you the basics of finding mean, median, standard deviation and percentiles. We provided you the formulas to each, as well as introduced some intermediate topics involving these measures of central tendency and variability. Here, we’ll present some advanced topics relating to these measures, namely the standard error and percentile rank.

Measures of Central Tendency and Variability

While measures like mean, median, standard deviation and percentiles can seem somewhat basic, the truth is that they are some of the most powerful tools of analysis within descriptive statistics and which form many of the fundamental concepts in more advanced, inferential statistics.

Percentiles, for example, is a concept that is used in quantile regression - which strives to identify patterns within different quantiles of a data set. These advanced applications rely on a strong, fundamental understanding of these more basic concepts. Below, you’ll find a brief recap of the formulas for sample mean, median and standard deviation as well as when they’re used and how to interpret them.

FormulaUsesInterpretation
Mean

    \[ \bar{x} = \frac{\Sigma x}{n} \]

  • Weighted mean
  • Good representation of most typical value in data sets without extreme outliers
The average of the data set, capturing where most values are cantered
MedianNo standard formula, is found by identifying the middle value of ordered data
  • Interquartile Range
  • Good representation of the middle value when there’s outliers
The midpoint of the data, representing the point where half the data falls above and below
SD

    \[ \sqrt{ \frac{\Sigma (x_{i} - \bar{x})^2 }{n-1} } \]

  • Z-scores
  • Variety of statistical tests on data
Is used to determine how typical a value is for a data set with a given mean and standard deviation

 

Superprof

Standard Error

The concept of the standard deviation is a great starting point for understanding the standard error. It is defined as the estimate of the standard deviation of an estimate. Thus far, you’ve been calculating the standard deviation of an entire sample. Meaning, you’ve calculated a measure by which you can identify how likely any value would appear given the mean and sample size of your data set.

The standard deviation typically goes hand in hand with the distribution of a variable. Every distribution you will encounter reflects a probability density function, where each point on the horizontal axis corresponds with the probability of it occurring in a given data set on the vertical axis.

The question the standard deviation of a variable or data set attempts to answer is whether a given value is likely to be found for a given distribution. The question the standard error tries to answer is whether how likely a given statistic, such as the mean, is to be found for a given data set.

Instead of trying to find the distribution of a data set, the standard error tries to find the distribution of a given statistic. Recall the difference between a sample and a population. Measures of a population are called parameters, where the standard deviation of a parameter gives us information about the distribution of that parameter.

Measures of a sample, however, are estimates of parameters and are called statistics. The standard error, then, is an estimate of the standard deviation of a statistic.

If this is confusing, take the data sets below as an example. Each measure the test scores of students at different schools for the same test.

ObservationSchool ASchool BSchool C
1452966
2645457
3385958
4706752
5626832
Mean565553

Taking these three sample means, we can make an informed guess that the mean test score for the region is around 55 points. If we had information for ten more schools, then 100 more schools, and finally for all the schools in the region - we could start to get closer and closer to some middle value that may be higher or lower than 55 points.

Because we don’t always have the chance to take infinitely many samples, a probability distribution of means is a great way of estimating how likely a given mean is for a particular data set. In the table below, you’ll find the formulas for the standard deviation and the standard error of the mean for a population and sample, respectively.

PopulationSample

    \[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \]

    \[ SE = \frac{s}{\sqrt{n}} \]

Problem 1

Given the following dataset, what can you say about the accuracy of the average of the data?

ObservationValue
145
267
328
432
529
646
761
858
949
1036
1134

Problem 2

Given the following information, determine which data set has a lower standard error of mean.

ObservationSchool ASchool BSchool C
Mean565553
SD4108
Sample Size91 000250

Problem 3

Given the following chart, determine one of  the measures of central tendency.

Bar char categories

Problem 4

Given the following chart, choose which measure of central tendency would be most appropriate.

Histogram extreme obs

Solution Problem 1

In this problem, you were asked to:

  • Say something about the accuracy of the mean

In this case, we first find the mean by following the steps below.

ObservationValue
145
267
328
432
529
646
761
858
949
1036
1134
Total485

The mean is calculated as

    \[ \bar{x} = \dfrac{485}{11} = 44.1 \]

To find the accuracy of the mean, we need to calculate the standard error of the mean by first finding the standard deviation.

    \[ s = \sqrt{ \dfrac{1832.91}{(11-1)} = 13.5 } \]

Then, we plug the SD into the formula for standard error.

    \[ SE = \frac{s}{\sqrt{n}} = \dfrac{13.5}{\sqrt{11}} = 4.1 \]

Because the standard error is relatively small when compared to the dataset, this suggests the mean is pretty accurate.

Solution Problem 2

You were asked to determine which data set has a lower standard error of mean. Using the information given, you can calculate the standard error for each sample.

ObservationSchool ASchool BSchool C
SE

    \[ = \dfrac{4}{\sqrt{9}} \]

    \[ = 1.33 \]

    \[ =\dfrac{10}{\sqrt{1000}} \]

    \[ = 0.32 \]

    \[ = \dfrac{8}{\sqrt{250}} \]

    \[ = 0.51 \]

Based on the calculations performed in the table above, the sample with the lowest standard error of the mean is School B.

Solution Problem 3

The only measure of central tendency we can get from this chart is the mode, which is Chemistry.

Bar char categories

Solution Problem 4

Because there are extreme values between 107 and 119, either the mode or the median would be most appropriate depending on what we’d like to investigate.

Histogram extreme obs

Did you like the article?

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5.00 out of 5)
Loading...

Danica

Located in Prague and studying to become a Statistician, I enjoy reading, writing, and exploring new places.

Did you like
this resource?

Bravo!

Download it in pdf format by simply entering your e-mail!

{{ downloadEmailSaved }}

Your email is not valid

Leave a Reply

avatar
  Subscribe  
Notify of