Chapters

## What is Regression?

The statistics involved in data analysis all have one thing in common: trying to find patterns within data. Patterns are essential, not only because they help us process everyday phenomena, but also because they can help us try to make predictions about the future.

One of the tools you can use to make predictions about the future is regression modelling. This section gives a brief overview of the concepts involved in regression, as well as practice problems you can use to test what you’ve learned. The best Maths tutors available
1st lesson free!  5 (27 reviews)
Ayush
£90
/h
1st lesson free!  4.9 (23 reviews)
Intasar
£42
/h
1st lesson free!  5 (17 reviews)
Matthew
£25
/h
1st lesson free!  4.9 (6 reviews)
Dr. Kritaphat
£39
/h
1st lesson free!  4.9 (11 reviews)
Paolo
£25
/h
1st lesson free!  4.9 (9 reviews)
Petar
£27
/h
1st lesson free!  5 (15 reviews)
Myriam
£20
/h
1st lesson free!  5 (12 reviews)
Andrea
£40
/h
1st lesson free!  5 (27 reviews)
Ayush
£90
/h
1st lesson free!  4.9 (23 reviews)
Intasar
£42
/h
1st lesson free!  5 (17 reviews)
Matthew
£25
/h
1st lesson free!  4.9 (6 reviews)
Dr. Kritaphat
£39
/h
1st lesson free!  4.9 (11 reviews)
Paolo
£25
/h
1st lesson free!  4.9 (9 reviews)
Petar
£27
/h
1st lesson free!  5 (15 reviews)
Myriam
£20
/h
1st lesson free!  5 (12 reviews)
Andrea
£40
/h

## Linear Correlation Coefficient

Most people have heard of the concept of correlation, but not many understand what it actually is. The most common correlation coefficient is the Pearson product-moment correlation coefficient, which is simply a statistic that tells us how closely related two variables are.

It is called a linear correlation coefficient because it measures the strength of the linear relationship between two variables. A perfect linear relationship describes a situation in which a change in one variable leads to the exact unit change in another variable. This perfect linear relationship corresponds to a correlation coefficient of 1.

## Simple Linear Regression

The correlation coefficient is only one part of analysing the relationship between variables. Linear regression modelling is another tool you can employ to analyse variables. In fact, it is one of the most powerful and commonly used methods of analysis in statistics.

Simple linear regression is a linear regression model that has only one independent variable and one dependent variable. An independent variable is the variable that you want to use to study a dependent variable. You can think of the dependent variable as the one you’re interested in studying.

The most common form of regression is ordinary least squares (OLS), which is a type of regression model that strives to find the best fit for the data by reducing the distance between the regression line and the data points.

## Multiple Linear Regression

Oftentimes, you’ll be interested in seeing how more than one independent variable affects a dependent variable. When your regression model includes one dependent variable and two or more independent variables it is called multiple linear regression.

You can think of multiple linear regression as the extension of simple linear regression and OLS. The concept is that each independent variable should be able to explain more variability, or changes, in the dependent variable.

## Gauss-Markov Theorem

The Gauss-Markov theorem is a concept that you will encounter a lot when dealing with linear regression models. This theorem states that if a specific set of assumptions are met, than the OLS estimators will be unbiased and have the smallest variance out of all of the possible linear estimators.

The six classic assumptions of the Gauss-Markov theorem are:

1. Regression model is linear in coefficients and the error term
2. The error term has an expected value of zero
3. Homoscedasticity: the conditional variance of the error term is constant for all observations
4. The error terms are independently distributed and are not correlated with each other
5. No independent variable is correlated with the error term
6. No multicollinearity

## Matrix Multiple Linear Regression

In higher level statistics, you’re likely to encounter this equation, where capital letters are indicate a matrix.

While it may look very similar to the regular SLR model, there is one key difference: all variables and parameters are in matrix notation. Matrices are extremely important to understand if you want to delve deeper into statistics - specifically matrix properties and matrix multiplication.

## Regression Problems

In this section, you will find a range of practice problems that you can use to solidify or test your knowledge in the basic to advanced concepts of regression. Try to solve the problems on your own based on other guides of this resources site and with the aid of the equations below. If you’re struggling to reach a solution, check out the step-by-step answers on each problem’s respective page.

### Problem 3

You are interested in knowing the relationship between the weather and tourism levels. To investigate, you collect data from the touristic centre in a city during one month in the summer, counting the number of people that arrive at the square at the same time every day. Given the data set below, what is the correlation between temperature and tourism? Interpret the correlation and name a few other reasons why these two variables are or are not related.

 Temperature Number of Visitors 12 87 21 150 20 110 25 90 17 85 15 70 13 90

### Problem 4

There are two variables that need to be studied: weight loss and days spent exercising one month. You are given a data set in which individuals have been asked the number of days they exercise for more than half an hour in one month. What kind of regression model can you use here? What are the results of this regression given the data set below. Interpret the model’s estimators.

 Exercise Days Weight Loss (in kg) 0 4 4 1 8 1.5 12 2 16 4 20 5 24 2

### Problem 5

You’re curious about which factors play into the salary people earn. In order to find out you’d like to conduct a multiple linear regression analysis on data that has the salary, education level in years, and work experience for 10 individuals. Conduct a multiple regression analysis by finding the regression model on the following data set.

 Education Experience Salary 11 10 30000 11 6 27000 12 10 20000 12 5 25000 13 5 29000 14 6 35000 14 5 38000 16 8 40000 16 7 45000 16 2 28000 18 6 30000 18 2 55000 22 5 65000 23 2 25000 24 1 75000

### Problem 5.2

In the last problem, you were asked to build a multiple regression model based on the given data set. This data set dealt with information on 15 individuals and had each other their salary, education level in years and work experience in years.

Now that you have the multiple regression model, interpret what these results mean. Explain the meaning of each linear estimator, providing an example of one interpolation and extrapolation.

Next, see what would happen to the interpretation of your results if you transformed the variable of salary amount into logarithms. Give an example of what this might do to do interpretation of your regression model.

### Problem 6

A classmate of yours is having trouble understanding what makes ordinary least squares, under the Gauss-Markov theorem, the best linear estimators as opposed to all the other estimators. Given the following example, explain what BLUE estimators mean and why they are important.

A sample is taken from a population that measures the money the observed companies spend on advertising and the amount of sales that they make in one month.

### Problem 6.2

You are given the following dataset and multiple regression model that explores the relationship between car sales blood pressure, weight, height and age. You’d like to conduct a multiple regression analysis but first want to check through the 6 OLS assumptions. Through the use of graphs and statistics, do you think this model passes each assumption? Explain why or why not.

 Blood Pressure Weight Height Age 105 75 172 19 106 80 175 18 108 89 170 20 110 90 174 20 113 93 178 21 115 95 179 22 118 96 180 24 119 99 183 25 120 101 185 29 122 102 188 30

### Problem 8

A shop owner is interested in understanding the demand of certain goods in her store based off of the price. In order to help the store owner, you’re tasked with conducting a simple regression analysis. Given the following data set, your first task is to explain how to format these observations into matrices along with the benefits of using matrices in statistics.

 Price Demand 120 110 125 100 130 90 135 45 140 20

### Problem 9

In the previous problem, you were asked to format the data into matrices. Now, using these matrices, find the regression model equation and interpret the results in terms of what this means for the shop owner.

## Equation Table

In this table, you will find all the equations you will need to use in order to solve the practice problems. If you’re having trouble understanding any of the formulas, make sure to review each page dealing with these formulas and the concepts behind them.

 Hypothesis Type Description Test Result  States that a parameter is equal to, less or greater than, or different from a hypothesized value  is rejected when p-value < 0.05  States that a parameter is not equal to, less or greater, or different from a hypothesized value  is accepted when p-value < 0.05

 Center & Spread Formulas Equations                Equations Regression Formulas  Simple linear regression  Multiple linear regression  Residual  MLR Matrix  SSE Matrix

 Parameters and Variables Equations                    Matrix Operations Rules Matrix Multiplication Dimensions have to have the same inner value (n x k) x (k x p) Resulting matrix is (n x p) Row values are multiplied by column values Inverse of 2 x 2 Find the determinant  Switch a and b Switch d and c Make a and b negative Inverse of 3 x 2 and above Perform a set of elementary operations (subtraction, addition, multiplication and division) Transpose The matrix (n x k) becomes (k x n) Rows become columns

Need a Maths teacher?

Did you like the article?     5.00/5 - 1 vote(s) Loading...

Danica

Located in Prague and studying to become a Statistician, I enjoy reading, writing, and exploring new places.