Chapters

Linear Regression Definition
Linear Regression Example
Residuals

The best Maths tutors available

Linear Regression Definition

A simplified definition of linear regression boils down to measuring the degree of the linear relationship between two variables. Linear regression has a vast number of diverse, real world applications. For example, linear regression can be used to investigate the relationship between income and life satisfaction, divorce rates and crises, and much more.

In order to understand some of the concepts behind linear regression, take a look at the definitions summarized in the table below.

Concept	Definition	Formula	Example
Simple Linear Regression	A linear regression model that has only one independent variable	$\text{[math]}$	$\text{[math]}$
Pearson’s Correlation Coefficient	The measure of the strength of the relationship between two variables.	$\text{[math]}$	The correlation coefficient between exam score and study hours is 0.84, suggesting a strong relationship.

There are two main methods of conducting linear regression, which are:

Simple linear regression
Multiple linear regression

Both types of regression seek to understand the relationship between a dependent and independent variable. The only difference is that, while simple linear regression deals with only one dependent variable and one independent variable, multiple linear regression has a dependent variable with more than one independent variable.

A dependent variable is defined as the variable which you are interested in studying. More specifically, this variable is the one that is effected depending on the change of value of the independent variable. This explains its name, because the value of the variable is “dependent” on the other variables.

An independent variable, on the other hand, is one that is independent of other variables you are attempting to study.

To understand linear regression, let's start with a step-by-step example of simple linear regression.

Linear Regression Example

In order to understand simple linear regression, we can start by breaking down the concepts used in linear regression with a step-by-step example. Note that when conducting a linear regression, you will most likely be using a program that calculates these steps for you. However, it can be helpful to do a regression by hand in order to fully grasp the ideas behind it.

The table below is a data set on 8 test scores and hours spent studying for that test.

Test Score	Hours Spent Preparing
95	6
90	4
89	5
82	5
80	4
70	3
60	0.5
95	6

Below, you'll find the steps that you should take when calculating the linear regression line broken down by the statistics you need to calculate.

Step	Description	Formula
1	Find the mean of x	$\text{[math]}$
2	Find the mean of y	$\text{[math]}$
3	Find the correlation of x and y	$\text{[math]}$
4	Find the standard deviation of x	$\text{[math]}$
5	Find the standard deviation of y	$\text{[math]}$
6	Find the slope	$\text{[math]}$
7	Find the intercept	$\text{[math]}$
8	Find the regression model	$\text{[math]}$

Following the steps outlined above, we first need to calculate the mean of both variables and their correlation coefficient.

Observation	Test Score	Hours Spent	$\text{[math]}$ $\text{[math]}$	$\text{[math]}$ $\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
1	95	6	12.4	1.8	22.4	153.1	3.3
2	90	4	7.4	-0.2	-1.4	54.4	0.0
3	89	5	6.4	0.8	5.2	40.6	0.7
4	82	5	-0.6	0.8	-0.5	0.4	0.7
5	80	4	-2.6	-0.2	0.5	6.9	0.0
6	70	3	-12.6	-1.2	15.0	159.4	1.4
7	60	0.5	-22.6	-3.7	83.4	511.9	13.6
8	95	6	12.4	1.8	22.4	153.1	3.3
Average	82.6	4.2		Total	147.1	1079.9	23.0

Next, using the formulas specified above, we complete steps 4 through 8.

$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$

By following these steps, we arrive at a linear regression model with a slope of 6.4 and a y-intercept of 55.8.

Residuals

Residuals can be used as an important indicator of whether or not our regression model is a good fit for our data. First, we can start by plotting our regression model from the previous example.

Residuals are the difference between the values predicted by the regression model, which can be seen on the regression line, and those that we actually observe, represented by the points. Take a look at the table below for the definition and notation of residuals.

Concept	Definition	Notation	Description
Residual	The difference between what the regression line predicts and what is plotted by the scatter plot	$\text{[math]}$	Residual = observed y - predicted y

We can plot the residuals by observing the distances between the regression line and the points. Note that we’ve changed the axis in order to get a better look at the residuals.

The green and blue lines represent the distance between the actual observed points and those predicted by the model. The table below calculates each residual, where the $\text{[math]}$ ’s are found by simply plugging in the same hours into the regression model

yi	yhat	Residual	Interpretation
95	94.2	0.8	Positive residuals indicate it is over predicted by the regression model
90	81.4	8.6	Overestimated
89	87.8	1.2	Overestimated
82	87.8	-5.8	Negative residuals indicate it is r predicted by the regression model
80	81.4	-1.4	Underestimated
70	75.0	-5.0	Underestimated
60	59.0	1.0	Overestimated
95	94.2	0.8	Overestimated

Summarise with AI:

Did you like this article? Rate it!

4.00 (2 rating(s))

Emma

I am passionate about travelling and currently live and work in Paris. I like to spend my time reading, gardening, running, learning languages and exploring new places.

Linear Regression

Linear Regression Definition

Linear Regression Example

Residuals

Central Limit Theorem

Linear Correlation Coefficient

Type I and Type II Errors

Hypothesis Testing

Linear Regression

Sampling

Solution to Problem of Regression 4

Solution to Problem of Regression 5

Solution to Problem of Regression 6

Solution to Problem of Regression 8

Solution to Problem of Regression 3

Solution to Problem of Regression 5

Solution to Problem of Regression 6

Solution to Problem of Regression 2

Solution to Problem of Regression 8

Two Variable Statistics

Solution to Problem of Regression 1

Solution to Problem of Regression 1

Solution to Problem of Regression 4

Correlation

Covariance

One Tailed Test

Two Tailed Test

Hypothesis Testing Problems

Problems of Regression

Correlation Problems and Solutions

Cancel reply

Linear Regression

Linear Regression Definition

Linear Regression Example

Residuals

Theory

Central Limit Theorem

Linear Correlation Coefficient

Type I and Type II Errors

Hypothesis Testing

Linear Regression

Sampling

Solution to Problem of Regression 4

Solution to Problem of Regression 5

Solution to Problem of Regression 6

Solution to Problem of Regression 8

Solution to Problem of Regression 3

Solution to Problem of Regression 5

Solution to Problem of Regression 6

Solution to Problem of Regression 2

Solution to Problem of Regression 8

Two Variable Statistics

Solution to Problem of Regression 1

Solution to Problem of Regression 1

Solution to Problem of Regression 4

Correlation

Covariance

One Tailed Test

Two Tailed Test

Exercises

Hypothesis Testing Problems

Problems of Regression

Correlation Problems and Solutions

Cancel reply