What is Regression?

Regression is a statistical method used to model the relationship between two or more variables. To understand regression, let’s start by understanding the different types of relationships two variables can have with each other.
linear_relationship
exponential_parabolaThe images above are examples of two variables graphed on a line plot. The lines going through the points represent the equation used to approximate the data points.

 

Graph Relationship Equation Formula
1 Parabolic Quadratic ax^2 + bx + c
2 Exponential Exponential growth f(x) = x^y
3 Linear Line y = mx + b

 

We can model data by finding the best line to represent the data. This is why regression is used, because it can be used to model the data that is available and predict the future.

 

The best Maths tutors available
1st lesson free!
Ayush
5
5 (27 reviews)
Ayush
£90
/h
1st lesson free!
Intasar
4.9
4.9 (23 reviews)
Intasar
£42
/h
1st lesson free!
Matthew
5
5 (17 reviews)
Matthew
£25
/h
1st lesson free!
Dr. Kritaphat
4.9
4.9 (6 reviews)
Dr. Kritaphat
£39
/h
1st lesson free!
Paolo
4.9
4.9 (11 reviews)
Paolo
£25
/h
1st lesson free!
Petar
4.9
4.9 (9 reviews)
Petar
£27
/h
1st lesson free!
Myriam
5
5 (15 reviews)
Myriam
£20
/h
1st lesson free!
Andrea
5
5 (12 reviews)
Andrea
£40
/h
1st lesson free!
Ayush
5
5 (27 reviews)
Ayush
£90
/h
1st lesson free!
Intasar
4.9
4.9 (23 reviews)
Intasar
£42
/h
1st lesson free!
Matthew
5
5 (17 reviews)
Matthew
£25
/h
1st lesson free!
Dr. Kritaphat
4.9
4.9 (6 reviews)
Dr. Kritaphat
£39
/h
1st lesson free!
Paolo
4.9
4.9 (11 reviews)
Paolo
£25
/h
1st lesson free!
Petar
4.9
4.9 (9 reviews)
Petar
£27
/h
1st lesson free!
Myriam
5
5 (15 reviews)
Myriam
£20
/h
1st lesson free!
Andrea
5
5 (12 reviews)
Andrea
£40
/h
First Lesson Free>

Single Linear Regression

Simple linear regression is a type of regression. Simple linear regression, or SLR, involves only one dependent and one independent variable. Notice that the equation for SLR follows closely to that of a line. The image below illustrates where these variables are located in a regression equation.

slr_line_equation

The difference between the response and explanatory variables are summarized in the table below.

 

Notation Variable Definition
y, \hat{y} Response, dependent Variable we want to predict
X, x Explanatory, Independent Variable used to predict response variable

 

As you may notice, there are two formulas for SLR. The difference between the two are explained in the table below.

 

Equation Variables Type Data used
1 y, \beta, X, \epsilon Population SLR Data on the entire population is used. The true population parameters are calculated.
2 \hat{y}, b, x Sample SLR Data using a sample from the population is used. True population parameters are estimated.

 

Multiple Linear Regression

Multiple linear regression, or MLR, is quite similar in definition to SLR. The difference between the two is that in an MLR model, more than one independent variable is used to estimate the dependent variable. Take a look at the equations below.

slr_mlr_equation

The two equations represent the same as the SLR equations, the top equation is the population MLR equation and the bottom equation is the sample MLR equation. The table below defined what each element in the formulas mean.

 

Element Description Definition
1 y Response variable The variable we’re trying to predict
2 \beta_{o} and b_{o} Constant The value of y when all x’s are zero
3 \beta and b’s Regression coefficients The amount y increases or decreases given a 1 unit change in x
4 x’s Independent variables The variable used to predict y
5 \epsilon Error The random error, the part of y that isn’t explained by x
mlr_estimators

MLR Estimators

There are many different approaches you can use to estimate a MLR model. The most common approach is to use any program that has the capability of calculating an MLR model given a data set. Another rare approach is to calculate an MLR model by hand. While this is not convenient and can lead to errors of calculation, it can be helpful for someone trying to understand the concepts behind regression.

 

Estimating MLR regression coefficients is a bit more difficult than for SLR coefficients. However, the general idea is the same. The picture below shows the equations.

 

MLR Interpretation

When it comes to interpreting an MLR model, the intuition is the same as for a SLR model. However, because there are more explanatory variables, there are a few more things you should take into account. Check out the table below in order to get a better idea on how to interpret these variables given a MLR model with two independent variables.

 

Element Interpretation
b_{o} Value of y when all independent variables are zero
b_{1} Value that y increases or decreases by given a change of 1 unit in x when b_{2} variables held constant
b_{2} Value that y increases or decreases by given a change of 1 unit in x when b_{1} variables held constant

 

Variable Transformations

Transforming variables are a common operation that is completed before a MLR model is run. The reason why some variables are transformed can be:

 

  • To have the variable follow a better distribution
  • To create a new variable
  • To improve the appearance in visualizations

 

These are the most common reasons why variables are transformed. Some common transformations to perform on a variable are:

 

  • Logarithm
  • Square
  • Square root

 

Interpretation with Transformed Variables

Interpreting a MLR model with transformed variables depends on the type of transformation performed. Typically, square or square root transformations are easier because the number is still on the same scale. Logarithmic transformations are a bit more complex because they involve a logged scale. The table below summarizes the interpretation of models using logarithm transformed values.

 

Type Model Interpretation of Regression Coefficients
Log-log y and x are log transformed An 1% increase of x will lead to a (b_{1})% in y
Linear-log x is log transformed An 1 unit increase of x will lead to an increase of (b_{1}/100) units in y
Log-linear y is log transformed An 1 unit increase in x will lead to an increase of (100*(b_{1}))% in y

 

Problem 1

You are tasked with interpreting the following MLR model.

regression_equation

 

Using what you know about how regression estimators are calculated, write a short summary describing the model.

 

Solution to Problem 1

When temperature and ad expenditure are both zero, there will still be 100 tickets sold. As the temperature increases by 1 degree, ticket sales increase by 1.3 tickets all other variables held constant. As ad expenditure increases by 100 pounds, ticket sales increase by 5.4 tickets.

 

Problem 2

Salary is a variable that typically has a right-skewed distribution. This is because most people tend to make around the same amount of money, with some people making extreme amounts of money. Because of this, a log transformation has been performed on salary. Interpret the following model.

 

regression_equation_example

 

Solution to Problem 2

The constant is not interpretable here, as salary and vacation days will never be zero. Every 1 day increase in vacation days increases the happiness score by 2 points. An 1 unit increase of salary will lead to an increase of 0.04 points in happiness score.

Need a Maths teacher?

Did you like the article?

1 Star2 Stars3 Stars4 Stars5 Stars 3.00/5 - 2 vote(s)
Loading...

Danica

Located in Prague and studying to become a Statistician, I enjoy reading, writing, and exploring new places.