Chapters

What is Regression?
Linear Correlation Coefficient
Simple Linear Regression
Multiple Linear Regression
Gauss-Markov Theorem
Matrix Multiple Linear Regression
Regression Problems
Equation Table

The best Maths tutors available

What is Regression?

The statistics involved in data analysis all have one thing in common: trying to find patterns within data. Patterns are essential, not only because they help us process everyday phenomena, but also because they can help us try to make predictions about the future.

One of the tools you can use to make predictions about the future is regression modelling. This section gives a brief overview of the concepts involved in regression, as well as practice problems you can use to test what you’ve learned.

Linear Correlation Coefficient

Most people have heard of the concept of correlation, but not many understand what it actually is. The most common correlation coefficient is the Pearson product-moment correlation coefficient, which is simply a statistic that tells us how closely related two variables are.

It is called a linear correlation coefficient because it measures the strength of the linear relationship between two variables. A perfect linear relationship describes a situation in which a change in one variable leads to the exact unit change in another variable. This perfect linear relationship corresponds to a correlation coefficient of 1.

Simple Linear Regression

The correlation coefficient is only one part of analysing the relationship between variables. Linear regression modelling is another tool you can employ to analyse variables. In fact, it is one of the most powerful and commonly used methods of analysis in statistics.

Simple linear regression is a linear regression model that has only one independent variable and one dependent variable. An independent variable is the variable that you want to use to study a dependent variable. You can think of the dependent variable as the one you’re interested in studying.

The most common form of regression is ordinary least squares (OLS), which is a type of regression model that strives to find the best fit for the data by reducing the distance between the regression line and the data points.

Multiple Linear Regression

Oftentimes, you’ll be interested in seeing how more than one independent variable affects a dependent variable. When your regression model includes one dependent variable and two or more independent variables it is called multiple linear regression.

You can think of multiple linear regression as the extension of simple linear regression and OLS. The concept is that each independent variable should be able to explain more variability, or changes, in the dependent variable.

Gauss-Markov Theorem

The Gauss-Markov theorem is a concept that you will encounter a lot when dealing with linear regression models. This theorem states that if a specific set of assumptions are met, than the OLS estimators will be unbiased and have the smallest variance out of all of the possible linear estimators.

The six classic assumptions of the Gauss-Markov theorem are:

Regression model is linear in coefficients and the error term
The error term has an expected value of zero
Homoscedasticity: the conditional variance of the error term is constant for all observations
The error terms are independently distributed and are not correlated with each other
No independent variable is correlated with the error term
No multicollinearity

Matrix Multiple Linear Regression

In higher level statistics, you’re likely to encounter this equation, where capital letters are indicate a matrix.

While it may look very similar to the regular SLR model, there is one key difference: all variables and parameters are in matrix notation. Matrices are extremely important to understand if you want to delve deeper into statistics - specifically matrix properties and matrix multiplication.

Regression Problems

In this section, you will find a range of practice problems that you can use to solidify or test your knowledge in the basic to advanced concepts of regression. Try to solve the problems on your own based on other guides of this resources site and with the aid of the equations below. If you’re struggling to reach a solution, check out the step-by-step answers on each problem’s respective page.

Problem 3

You are interested in knowing the relationship between the weather and tourism levels. To investigate, you collect data from the touristic centre in a city during one month in the summer, counting the number of people that arrive at the square at the same time every day. Given the data set below, what is the correlation between temperature and tourism? Interpret the correlation and name a few other reasons why these two variables are or are not related.

Temperature	Number of Visitors
12	87
21	150
20	110
25	90
17	85
15	70
13	90

Problem 4

There are two variables that need to be studied: weight loss and days spent exercising one month. You are given a data set in which individuals have been asked the number of days they exercise for more than half an hour in one month. What kind of regression model can you use here? What are the results of this regression given the data set below. Interpret the model’s estimators.

Exercise Days	Weight Loss (in kg)
0	4
4	1
8	1.5
12	2
16	4
20	5
24	2

Problem 5

You’re curious about which factors play into the salary people earn. In order to find out you’d like to conduct a multiple linear regression analysis on data that has the salary, education level in years, and work experience for 10 individuals. Conduct a multiple regression analysis by finding the regression model on the following data set.

Education	Experience	Salary
11	10	30000
11	6	27000
12	10	20000
12	5	25000
13	5	29000
14	6	35000
14	5	38000
16	8	40000
16	7	45000
16	2	28000
18	6	30000
18	2	55000
22	5	65000
23	2	25000
24	1	75000

Problem 5.2

In the last problem, you were asked to build a multiple regression model based on the given data set. This data set dealt with information on 15 individuals and had each other their salary, education level in years and work experience in years.

Now that you have the multiple regression model, interpret what these results mean. Explain the meaning of each linear estimator, providing an example of one interpolation and extrapolation.

Next, see what would happen to the interpretation of your results if you transformed the variable of salary amount into logarithms. Give an example of what this might do to do interpretation of your regression model.

Problem 6

A classmate of yours is having trouble understanding what makes ordinary least squares, under the Gauss-Markov theorem, the best linear estimators as opposed to all the other estimators. Given the following example, explain what BLUE estimators mean and why they are important.

A sample is taken from a population that measures the money the observed companies spend on advertising and the amount of sales that they make in one month.

Problem 6.2

You are given the following dataset and multiple regression model that explores the relationship between car sales blood pressure, weight, height and age. You’d like to conduct a multiple regression analysis but first want to check through the 6 OLS assumptions. Through the use of graphs and statistics, do you think this model passes each assumption? Explain why or why not.

Blood Pressure	Weight	Height	Age
105	75	172	19
106	80	175	18
108	89	170	20
110	90	174	20
113	93	178	21
115	95	179	22
118	96	180	24
119	99	183	25
120	101	185	29
122	102	188	30

Problem 8

A shop owner is interested in understanding the demand of certain goods in her store based off of the price. In order to help the store owner, you’re tasked with conducting a simple regression analysis. Given the following data set, your first task is to explain how to format these observations into matrices along with the benefits of using matrices in statistics.

Price	Demand
120	110
125	100
130	90
135	45
140	20

Problem 9

In the previous problem, you were asked to format the data into matrices. Now, using these matrices, find the regression model equation and interpret the results in terms of what this means for the shop owner.

Equation Table

In this table, you will find all the equations you will need to use in order to solve the practice problems. If you’re having trouble understanding any of the formulas, make sure to review each page dealing with these formulas and the concepts behind them.

Hypothesis Type	Description	Test Result
$\text{[math]}$	States that a parameter is equal to, less or greater than, or different from a hypothesized value	$\text{[math]}$ is rejected when p-value < 0.05
$\text{[math]}$	States that a parameter is not equal to, less or greater, or different from a hypothesized value	$\text{[math]}$ is accepted when p-value < 0.05

Center & Spread Formulas	Equations
$\text{[math]}$	\[ \frac{Cov(x,y)}{\sigma_{x} \sigma_{y}} \]
$\text{[math]}$	\[ \frac{\sum_{i=1}^{n}(x_{i}- \bar{x})(y_{i}- \bar{y})}{n-1} \]
$\text{[math]}$	\[ \sqrt{\frac{\sum(x_{i} - \bar{x}^2)}{n-1}} \]
$\text{[math]}$	\[ \sqrt{\frac{\sum(y_{i} - \bar{y}^2)}{n-1}} \]

Equations	Regression Formulas
$\text{[math]}$	Simple linear regression
$\text{[math]}$	Multiple linear regression
$\text{[math]}$	Residual
$\text{[math]}$	MLR Matrix
$\text{[math]}$	SSE Matrix

Parameters and Variables	Equations
$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$
$\text{[math]}$	$\text{[math]}$

Matrix Operations	Rules
Matrix Multiplication	Dimensions have to have the same inner value (n x k) x (k x p) Resulting matrix is (n x p) Row values are multiplied by column values
Inverse of 2 x 2	Find the determinant $\text{[math]}$ Switch a and b Switch d and c Make a and b negative
Inverse of 3 x 2 and above	Perform a set of elementary operations (subtraction, addition, multiplication and division)
Transpose	The matrix (n x k) becomes (k x n) Rows become columns

Summarise with AI:

Did you like this article? Rate it!

4.00 (3 rating(s))

Emma

I am passionate about travelling and currently live and work in Paris. I like to spend my time reading, gardening, running, learning languages and exploring new places.

Problems of Regression

What is Regression?

Linear Correlation Coefficient

Simple Linear Regression

Multiple Linear Regression

Gauss-Markov Theorem

Matrix Multiple Linear Regression

Regression Problems

Problem 3

Problem 4

Problem 5

Problem 5.2

Problem 6

Problem 6.2

Problem 8

Problem 9

Equation Table

Theory

Central Limit Theorem

Linear Correlation Coefficient

Type I and Type II Errors

Hypothesis Testing

Linear Regression

Sampling

Solution to Problem of Regression 4

Solution to Problem of Regression 5

Solution to Problem of Regression 6

Solution to Problem of Regression 8

Solution to Problem of Regression 3

Solution to Problem of Regression 5

Solution to Problem of Regression 6

Solution to Problem of Regression 2

Solution to Problem of Regression 8

Two Variable Statistics

Solution to Problem of Regression 1

Solution to Problem of Regression 1

Solution to Problem of Regression 4

Correlation

Covariance

One Tailed Test

Two Tailed Test

Exercises

Hypothesis Testing Problems

Problems of Regression

Correlation Problems and Solutions

Cancel reply