What are Inferential Statistics?

Many people tend to think of statistics as one unattractive, mathematical blob. In fact, there are many branches under the umbrella of statistics, including probability, machine learning, and more. These branches all fall under one of two categories, illustrated below. 
inferenctial_descriptive
The main differences between inferential and descriptive statistics are summarized in the table below.

Inferential Descriptive
Definition Statistical analysis that predicts the future using the dataset Statistical analysis that illustrates or measures data included in the dataset
Measures -Regression

-Hypothesis tests

-Central tendency (mean, mode)

-Spread (standard deviation, variance)

Variable Types Numeric and Categorical Numerical and Categorical
Example Regression analysis between test score and hours spent studying Calculating the mean test score for different schools

 

The best Maths tutors available
1st lesson free!
Ayush
5
5 (27 reviews)
Ayush
£90
/h
1st lesson free!
Intasar
4.9
4.9 (23 reviews)
Intasar
£42
/h
1st lesson free!
Matthew
5
5 (17 reviews)
Matthew
£25
/h
1st lesson free!
Dr. Kritaphat
4.9
4.9 (6 reviews)
Dr. Kritaphat
£39
/h
1st lesson free!
Paolo
4.9
4.9 (11 reviews)
Paolo
£25
/h
1st lesson free!
Petar
4.9
4.9 (9 reviews)
Petar
£27
/h
1st lesson free!
Myriam
5
5 (15 reviews)
Myriam
£20
/h
1st lesson free!
Andrea
5
5 (12 reviews)
Andrea
£40
/h
1st lesson free!
Ayush
5
5 (27 reviews)
Ayush
£90
/h
1st lesson free!
Intasar
4.9
4.9 (23 reviews)
Intasar
£42
/h
1st lesson free!
Matthew
5
5 (17 reviews)
Matthew
£25
/h
1st lesson free!
Dr. Kritaphat
4.9
4.9 (6 reviews)
Dr. Kritaphat
£39
/h
1st lesson free!
Paolo
4.9
4.9 (11 reviews)
Paolo
£25
/h
1st lesson free!
Petar
4.9
4.9 (9 reviews)
Petar
£27
/h
1st lesson free!
Myriam
5
5 (15 reviews)
Myriam
£20
/h
1st lesson free!
Andrea
5
5 (12 reviews)
Andrea
£40
/h
First Lesson Free>

Regression Definition

You have probably heard of regression in many different contexts. This is because regression analysis is one of the most widely used tools of inferential statistics. Regression analysis is defined as the process of measuring the relationship between two or more variables.

independent_dependent_graphs

The image above is a graph containing the two types of variables in regression analysis: independent and dependent variables. Notice that there is a pattern between these two variables. This pattern can be captured with a regression model, which models the linear relationship between two variables.

 

Independent Variable Dependent Variable
Definition The variable that we use to predict our dependent variable The variable that responds to the independent variable
Type Numeric or categorical (known as a ‘dummy’ variable) Numerical, can only be categorical when using a special type of regression called logistic regression
Other Names Explanatory variable Response variable

 

Simple Regression Formula

As mentioned, linear regression can be used to model the relationship between two or more variables. When a linear regression involves only one independent and one dependent variable, this is known as simple linear regression, or SLR.

independent_variable

The graph above is the same as the one before, with the only difference being the line running through the observed data points. This line is known as a regression line. The regression line is calculated based off of the following formula.

regression_slr

The reason why there are two formulas has to do with the fact that one is the formula for the population while the other is a formula for the sample. Recall that a population contains all the things we want to study, which means that we rarely have access to all the data from the population. The sample, on the other hand, is a subvert of the population. With the sample, we can find an estimation of the true population regression model.

 

Population Sample
Response Variable The population dependent variable The sample dependent variable
Explanatory Variable The population explanatory variable The sample explanatory variable
Constant The value of the population dependent variable when all independent variables are zero The value of the sample  dependent variable when all independent variables are zero
Regression coefficient The population parameters The sample estimates of the population parameters
Error The part of y not explained by x Is assumed to be zero

 

SLR Estimate Formulas

Many SLR models are run using some program or software. Meaning, programs such as R or Python take the data in your model and run the regression model automatically, calculating all regression coefficients and statistics. Many people, when learning statistics, start by calculating regression estimates by hand.

simple_linear_regression_model

In the image above, you can see that there are two parameters that we estimate using SLR. The first is the y-intercept, which is the value of y when all x’s are zero. The formula can be seen below.

The following table describes each element in the formula

 

Element Description
\bar{y} Mean of y
b_{1} The regression coefficient
\bar{x} Mean of x

 

As you can see, we need to first calculate the sample regression coefficient before calculating the intercept. Below, you can find the formula for b_{1}.

regression_coefficient

The following table contains the explanation for the formula.

 

Element Description
x_{i} The ith observation of x
\bar{x} The mean of x
y_{i} The ith observation of y
\bar{y} The mean of y

 

In order to find the full regression model, all you need to do is simply plug the calculated constant and regression coefficient into the model. Take the following scenario as an example.

 

Element Description
y Shoe price
b_{o} 30
b_{1} 4.5
x Number of customizations

 

In the above example, the slope and regression coefficient have already been calculated. The SLR model would therefore look like this:

regression_example

 

Problem 1

In this section you learned about the differences between descriptive and inferential statistics. You are interested in understanding the differences between what analysis you can do on a data set. You are given the data set below, which comes from a restaurant on the beach. This restaurant is interested in knowing what the relationship is between the number of soups sold and the weather. Classify the types of analysis you can do on this data set based on the differences between inferential and descriptive statistics.

 

Soup Sales Temperature
24 2
15 10
8 17
5 27

 

Solution to Problem 1

In this problem, you were asked to:

 

  • Understand the differences between the two branches of statistics
  • Write down some analysis you can do based on these two branches

 

The first step in solving this problem is knowing what the main differences are between inferential and descriptive statistics. First, descriptive statistics uses the information within the data set in order to describe what the data looks like. On the other hand, inferential statistics uses the data set to try to make inferences about data points outside of its range.

 

Next, we can classify the different analysis in the table below.

 

Inferential Descriptive
Simple linear regression Measures of central tendency: mean, median, mode
Hypothesis testing Measures of spread: variance, standard deviation, range
Modelling Descriptive visualizations: pie chart, bar chart, etc.

 

Problem 2

In the previous example you were asked to describe the types of analysis you could conduct based on the two types of statistics. Next, using the same data, you are asked to conduct a regression analysis. Build a simple linear regression model based on the formulas provided. Next, describe how this model would look on the following chart.

dependent_variable

 

Solution to Problem 2

In this problem, you were asked to build a regression model. First, you need to calculate the mean. Next, subtract the mean from all observations in your data set and

 

Temperature Soup Sales x-\bar{x} y-\bar{y} (x-\bar{x})^2 (x-\bar{x})(y-\bar{y})
2 24 -12 11 144 -132
10 15 -4 2 16 -8
17 8 3 -5 9 -15
27 5 13 -8 169 -104
Mean = 14 Mean = 13 Total 338 -259

 

Next, we plug it into the equations for b_{o} and b_{1}:

    \[ b_{1} = \dfrac{-259}{338} = -0.766 \]

 

    \[ b_{o} = 13 - (-0.766*14) = 24 \]

 

Finally, we get the following regression:

    \[ \hat{y} = 24 - 0.766(temperature) \]

 

This model would be a line on the graph above.

 

Need a Maths teacher?

Did you like the article?

1 Star2 Stars3 Stars4 Stars5 Stars 3.00/5 - 2 vote(s)
Loading...

Danica

Located in Prague and studying to become a Statistician, I enjoy reading, writing, and exploring new places.