Population Definition

In probability theory and statistics, there are two terms that are fundamental in understanding why many of the techniques are used. These two terms are: population and sample. A population is the group of people, places, or things you’re interested in studying. You can find some examples in the image below.
population

Image Study Interest Population
A Voter preference for people in the UK All people of voting-age in the UK
B Trees affected by an infectious disease All trees in the UK
C Daily number of tea drunk per person All cups of tea drunk in the UK

 

As you can see, populations tend to be enormous. Take the first example, described in image A. The number of people 18 and over was 52.7 million people in 2019, according to the ONS. Imagine measuring the voting preference of all those people!

 

The best Maths tutors available
1st lesson free!
Intasar
4.9
4.9 (23 reviews)
Intasar
£42
/h
1st lesson free!
Matthew
5
5 (17 reviews)
Matthew
£25
/h
1st lesson free!
Dr. Kritaphat
4.9
4.9 (6 reviews)
Dr. Kritaphat
£49
/h
1st lesson free!
Paolo
4.9
4.9 (11 reviews)
Paolo
£25
/h
1st lesson free!
Petar
4.9
4.9 (9 reviews)
Petar
£27
/h
1st lesson free!
Rajan
4.9
4.9 (11 reviews)
Rajan
£15
/h
1st lesson free!
Farooq
5
5 (13 reviews)
Farooq
£35
/h
1st lesson free!
Myriam
5
5 (15 reviews)
Myriam
£20
/h
1st lesson free!
Intasar
4.9
4.9 (23 reviews)
Intasar
£42
/h
1st lesson free!
Matthew
5
5 (17 reviews)
Matthew
£25
/h
1st lesson free!
Dr. Kritaphat
4.9
4.9 (6 reviews)
Dr. Kritaphat
£49
/h
1st lesson free!
Paolo
4.9
4.9 (11 reviews)
Paolo
£25
/h
1st lesson free!
Petar
4.9
4.9 (9 reviews)
Petar
£27
/h
1st lesson free!
Rajan
4.9
4.9 (11 reviews)
Rajan
£15
/h
1st lesson free!
Farooq
5
5 (13 reviews)
Farooq
£35
/h
1st lesson free!
Myriam
5
5 (15 reviews)
Myriam
£20
/h
First Lesson Free>

Definition of a Sample

Because populations tend to be enormous, we need a way to estimate the metrics we want to study without needing to measure all units or individuals in the population. This is where samples come in. Samples are defined as a subset of a population that is used to estimate true population parameters. Take a look at the image below to see how we solve the examples given above.

sample

 

Image Population Sample
A All people of voting-age in the UK 500 people of voting age in each region of the UK
B All trees in the UK 50 trees in each national park
C All cups of tea drunk in the UK Coffee drinks of 1,000 people in the UK

 

Types of Samples

There are actually many different types of samples that you can take from a population. No one sample is the best, as each depends on the population of interest as well as the resources available to you. There are two main types of samples, which can be seen described in the image below.

sampling_techniques

While understanding the intricacies of samples aren’t super important here, it’s important to know that for probability samples, you are able to apply the inferential tools involved in probability theory. These inferential tools involve things like:

  • Confidence interval
  • Hypothesis testing

 

Confidence Interval Definition

As you can see, confidence intervals are part of the inferential tools of probability theory. As discussed, samples can be used to estimate the true population parameter. To understand this, let’s revisit the tea example.

sampling
Composition Mean Cups per Day Meaning
Population All people in the UK who drink tea 3 True value, which rarely can ever be measured
Sample A sample of 1,000 tea drinkers 2.2 Estimated by the population

 

As you can see in the image above, we have a population parameter of 3 cups of tea per day per person versus what we measured in the sample: 2.2 cups. Because we’re estimating the true population number using the sample, we can use the confidence interval to capture the uncertainty in this estimation.

 

A confidence interval is defined as a range of values that’s likely to contain the true population parameter. It can be calculated for:

  • Mean
  • Proportion

 

Population Proportion

A population proportion is simply the true proportion measured for the population. A proportion is the ratio of a subset of a group in relation to the entire group. The table below illustrates the differences between a sample and population proportion.

 

Formula Example
Population p = \frac{M}{N} Number of people who voted pink in population
Sample \bar{p} = \frac{m}{n} Number of people who voted pink in sample

 

In practice, many people conduct studies on the same variable of interest. Continuing the example above, say five studies were conducted measuring the proportion of people who voted for pink.

mean_CI

The image above illustrates the distribution of these sample proportions. These proportions represent estimates of the true population proportion.

 

Confidence Interval for the Proportion

In order to be certain that we’ve captured the true population measure, we can build a confidence interval. The formula for the confidence interval is the following.

 

    \[ Confidence \; Interval \; = \; \bar{p} \; \pm \; z*(\sqrt{\frac{\bar{p}(1-\bar{p})}{n}}) \]

 

The table below gives an explanation of each of the elements in the formula.

 

Element Description
\bar{p} The sample proportion
z The z-score
n The sample size

 

This formula results in a range of values above and below the sample proportion that is likely to contain the population parameter. Take the example from before, where we were given a couple of different sample proportions.

confidence_interval

As you can see, taking several samples gives us an idea of where the true population parameter might lie. Instead of taking many different samples, a confidence interval can give us an idea of the range of values that include the population proportion.

 

Confidence Level

The confidence level represents what amount of certainty you want for your confidence interval. The bigger the confidence level, the more certainty you introduce into your interval - and vice versa. Recall that z-scores are the values on a z-table corresponding to the z-scores on a standard normal distribution.

z-scores_distribution

Each z-score is simply a standardized version of the normal value, which in this case would be our proportion. Each z-score corresponds to a probability, marked on the y-axis, which tells us how likely that z-score is given the distribution. The confidence level, which can be thought of as a probability, have their corresponding z-values. The most common ones are listed below.

 

Confidence Level Z-Score
0.95 1.96
0.90 1.645
0.85 1.44

 

Interpretation of Confidence Interval

Let’s continue the example from before. Say that you take a sample of 1,000 people and 320 voted for pink. To find the confidence level, we first determine n and \bar{p}.

 

Sample size n 1,000
Sample proportion \bar{p} 320/1000 = 0.32

 

Next, we simply plug in the values into the formula for the confidence interval. Let’s see the difference between confidence intervals at different confidence levels.

 

95% Confidence Interval 0.32 \pm 1.96*(\sqrt{\frac{0.32*(1-0.32)}{1000}}) 0.35,0.29 There is a 95% chance that the confidence interval between 350 and 290 contains the true population proportion of those who voted pink
85% Confidence Interval $0.32 \pm 1.44*(\sqrt{\frac{0.32*(1-0.32)}{1000}}) 0.34, 0.3 There is an 85% chance that the confidence interval between 340 and 300 contains the true population proportion of those who voted pink

 

As you can see, the confidence interval is wider at a 0.85 confidence level than at 0.95.

 
Need a Maths teacher?

Did you like the article?

1 Star2 Stars3 Stars4 Stars5 Stars 3.00/5 - 2 vote(s)
Loading...

Danica

Located in Prague and studying to become a Statistician, I enjoy reading, writing, and exploring new places.