Regression Definition

Regression is defined as the process of estimating the relationship between two variables. Regression is one of the most common statistical tools people use in order to try and model the present and predict the future.
regression_definition
The graph above is a scatterplot that illustrates two numerical variables. This type of graph is the first graph people typically create if they’re interested in creating a regression model between two variables. This is because this line graph gives us a visual representation of the linear relationship between the x and y variables.Do you think that there is any linear association between the two variables? In other words, do you think a regression model would be appropriate here? Keep reading to find out if you’re reasoning was on the mark.

The best Maths tutors available
1st lesson free!
Ayush
5
5 (27 reviews)
Ayush
£90
/h
1st lesson free!
Intasar
4.9
4.9 (23 reviews)
Intasar
£42
/h
1st lesson free!
Matthew
5
5 (17 reviews)
Matthew
£25
/h
1st lesson free!
Dr. Kritaphat
4.9
4.9 (6 reviews)
Dr. Kritaphat
£39
/h
1st lesson free!
Paolo
4.9
4.9 (11 reviews)
Paolo
£25
/h
1st lesson free!
Petar
4.9
4.9 (9 reviews)
Petar
£27
/h
1st lesson free!
Myriam
5
5 (15 reviews)
Myriam
£20
/h
1st lesson free!
Andrea
5
5 (12 reviews)
Andrea
£40
/h
1st lesson free!
Ayush
5
5 (27 reviews)
Ayush
£90
/h
1st lesson free!
Intasar
4.9
4.9 (23 reviews)
Intasar
£42
/h
1st lesson free!
Matthew
5
5 (17 reviews)
Matthew
£25
/h
1st lesson free!
Dr. Kritaphat
4.9
4.9 (6 reviews)
Dr. Kritaphat
£39
/h
1st lesson free!
Paolo
4.9
4.9 (11 reviews)
Paolo
£25
/h
1st lesson free!
Petar
4.9
4.9 (9 reviews)
Petar
£27
/h
1st lesson free!
Myriam
5
5 (15 reviews)
Myriam
£20
/h
1st lesson free!
Andrea
5
5 (12 reviews)
Andrea
£40
/h
First Lesson Free>

Simple Linear Regression

Simple linear regression, or SLR, is a special type of regression. SLR involves only one independent and one dependent variable. Check out the breakdown of the SLR equation below.

 

y Response variable The variable that is responsive to changes in the x variable
x Explanatory variable The variable that explains the variation in the y variable
Bo Constant The value of y if x were zero
B1 Regression coefficient (slope) The change in y if x increases by one unit

 

A simple linear regression can help you understand the relationship between two variables. The table below summarizes the model’s estimate formulas.

 

b_{o} \bar{y} - b_{1} \bar{x}
b_{1} \frac{ \sum (x_{i}-\bar{x}) (y_{i}-\bar{y}) }{ \sum (x_{i}-\bar{x})^2}

 

SLR Interpretation

One of the most important components in a regression model is how you interpret its results. Take the graph from the previous example, where a regression line has been added.

regression_line_formula

There are several ways you can interpret this graph. There are a few intuitive remarks we can make before looking at the regression model, summarized below.

 

Y This variable has a small range compared to the x variable, it has a couple of extreme values
X This variable has a large range compared to the y variable
Association There appears to be a rather week relationship between the two variables

 

Next, look at the equation for this regression model.

estimated_regression

Recall that there are two elements we can interpret directly from this model: the constant and slope. On the other hand, we can also interpret the y and x variables if we plug in some numbers to the equation - also known as extrapolation and interpolation.

 

Multiple Linear Regression

Multiple linear regression, or MLR, is similar to SLR with the only difference being that in MLR, there is more than one explanatory variable included in the model. Because of these extra explanatory variables, the equation of the model looks slightly different. Take a look at the image below, which compares the SLR and MLR population and sample models.

single_and_,multiple_regression

Notice that there are more regression coefficients in the MLR model. Just like with SLR, MLR models are usually calculated using programs such as R or Python. However, you can also calculate MLR models by hand. Take a look at the image below for the equations for the regression coefficients for an MLR model with two explanatory variables.

estimated_y_hat_formula

 

regression_constant_formula

 

regression_coefficient_formula

 

multiple_regression_estimator

 

regression_step_by_step

 

regression_by_hand

 

x2_multiple_regression_formula

 

x_formula_multiple_regression

 

 

In order to understand these elements, take a look at the table below, which describes each of them in terms of what they mean.

 

Element Description
b_{0} The constant
b_{1} The regression coefficient for the first explanatory variable (x1)
b_{2} The regression coefficient for the second explanatory variable (x2)
\sum x_{1}^2 The sum of all squared x1 values
\sum x_{2}^2 The sum of all squared x2 values
\sum x_{1}y The sum of the all values of the first explanatory variable multiplied by the response variable
\sum x_{2}y The sum of the all values of the second explanatory variable multiplied by the response variable

 

MLR Interpretation

In order to interpret a MLR model, you have to understand that the majority of the interpretation relies on the regression coefficients and the constant of the model. Take the following model as an example.

 

Y = 170 + 4.5x1 + 6.4x2

 

Here, the interpretation would be as follows.

 

170 The constant, which is the value of y when x1 and x2 are zero
4.5 Y increases by 4.5 units when x1 increases by 1 unit, given that x2 is held constant
6.4 Y increases by 6.4 units when x2 increases by 1 unit, x1 held constant.

 

Problem 1

Using the information above, calculate the MLR model. If you don’t recall the formulas, check the solution below.

 

Element Description
\sum x_{1}^2 100
\sum x_{2}^2 300
\sum x_{1}y 250
\sum x_{2}y 115
\sum x_{1}x_{2} 310
\bar{x_{1}} 15
\bar{x_{2}} 5
\bar{y} 25

 

Solution to Problem 1

To calculate the constant and regression coefficients, we need to plug in the data we have.

 

b_{1} \frac{ (\sum x_{2}^2) (\sum x_{1}y) - (\sum x_{1}x_{2})(\sum x_{2}y) }{ (\sum x_{1}^2)(\sum x_{2}^2) - (\sum x_{1} x_{2})^2 }
\frac{(300*250) -(310*115)}{ (100*300) - (310^2)} = -0.59

 

b_{2} \frac{ (\sum x_{1}^2) (\sum x_{2}y) - (\sum x_{1}x_{2})(\sum x_{1}y) }{ (\sum x_{1}^2)(\sum x_{2}^2) - (\sum x_{1} x_{2})^2 }
\frac{(100*115) -(310*250)}{ (100*300) - (310^2)} = 0.79

 

b_{0} \bar{y} - b_{1}\bar{x_{1}} - b_{2}x_{2}}
25 - (-0.59*15)-(0.79*5) = 30

 

Plugging this in to the final equation, we get:

 

    \[ y = 30 - 0.59x_{1} + 0.70x_{2} \]

Need a Maths teacher?

Did you like the article?

1 Star2 Stars3 Stars4 Stars5 Stars 3.00/5 - 2 vote(s)
Loading...

Danica

Located in Prague and studying to become a Statistician, I enjoy reading, writing, and exploring new places.