Chapters

Problem 3
What is the Correlation Coefficient?
Derivation of Formula
Interpretation of Correlation Coefficient
Step by Step Solution

The linear correlation coefficient is one of the fundamental concepts behind the interpretation of regression models. In order to understand the mathematics and ideas behind the correlation coefficient, try to solve the following problem by reviewing what you know. If you’re encountering this concept for the first time, read through this guide for a step-by-step walk through.

The best Maths tutors available

Problem 3

You are interested in knowing the relationship between the weather and tourism levels. To investigate, you collect data from the touristic centre in a city during one month in the summer, counting the number of people that arrive at the square at the same time every day. Given the data set below, what is the correlation between temperature and tourism? Interpret the correlation and name a few other reasons why these two variables are or are not related.

Temperature	Number of Visitors
12	87
21	150
20	110
25	90
17	85
15	70
13	90

What is the Correlation Coefficient?

The Pearson correlation coefficient, also known as the Pearson product-moment correlation coefficient, is one of the most powerful statistics in the field. Be careful not to confuse this with the coefficient of determination, which is also known as the “R squared” value. The correlation coefficient is a statistic that measures the strength of the linear relationship between two variables. A linear relationship between any two variables means that when the two variables are graphed, they follow a straight line. In other words, an increase or decrease in one variable will see a corresponding increase or decrease in the other variable as well. cuasation_correlation You should be careful not to confuse the correlation coefficient with causation. Simply because two variables exhibit a strong linear correlation doesn’t mean one causes the other. A classic example is the strong linear relationship between shark attacks and ice cream sales. As ice cream sales increase, there is a corresponding increase in shark attacks as well. This does not mean an increase in ice cream sales cause an increase in shark attacks. Correlation simply signals towards a relationship between two variables. However, those two variables might have an underlying, common relationship to another variable which can explain why they are related in the first place. In this example, ice cream sales and shark attacks can exhibit a strong relationship because of hot weather: the hotter it is, the more people buy ice cream and swim in the ocean.

Derivation of Formula

The formula for the correlation coefficient is the following. [ rho_{xy} = frac{Cov(x,y)}{sigma_{x} sigma_{y}} ] While this formula may seem confusing at first, it is actually quite simple to understand when breaking down each element of the formula.

$\text{[math]}$	Pearson product moment correlation
$\text{[math]}$	Covariance between x and y
$\text{[math]}$	Standard deviation of x
$\text{[math]}$	Standard deviation of y

Let’s take the first element, which is the covariance. The covariance of two variables measures the direction of the relationship between them. In other words, the covariance measures how two variables move together. Next, let’s look at the two elements of the denominator of the correlation coefficient. The standard deviation is a statistic that measures how far spread the variable is from the mean. The formulas for all three elements can be seen below.

$\text{[math]}$	[ frac{sum_{i=1}^{n}(x_{i}- bar{x})(y_{i}- bar{y})}{n-1} ]
$\text{[math]}$	[ sqrt{frac{sum(x_{i} - bar{x}^2)}{n-1}} ]
$\text{[math]}$	[ sqrt{frac{sum(y_{i} - bar{y}^2)}{n-1}} ]

As you can see, these three elements are what go into deriving the correlation formula. In the numerator, you have the measure of the direction of the relationship between two variables. This relationship can be either positive or negative. If, for example, the relationship is positive, this means that a decrease in one variable would result in a decrease in another variable - and vice versa. On the other hand, a negative covariance would mean that a decrease in one variable would result in an increase in the other variable, and again vice versa. The denominator is the multiplication of the standard deviations of both variables. The standard deviation of a variable is a measure of dispersion. This means that it measures the spread of a variable around it’s mean. To derive the correlation coefficient formula, you first plug in the three elements of the formula into the correlation coefficient formula. [ frac{frac{sum_{i=1}^{n}(x_{i}- bar{x})(y_{i}- bar{y})}{n-1}}{sqrt{frac{sum(x_{i} - bar{x}^2)}{n-1}}*sqrt{frac{sum(y_{i} - bar{y}^2)}{n-1}}} ] Recall that in mathematics, the square root of a fraction is simply the square root of the numerator divided by the square root of the denominator. This means that the denominator becomes: [ frac{sum_{i=1}^{n}(x_{i}- bar{x})(y_{i}- bar{y})}{n-1} div ( frac{sqrt{sum(x_{i} - bar{x}^2)}}{sqrt{n-1}} * frac{sqrt{sum(y_{i} - bar{y}^2)}}{sqrt{n-1}} ) ] $squareroot_fraction$ Recall that a square root times itself is simply the number. For an example, take the number 3. Also, keep in mind that when multiplying fractions, they become one fraction where the numerator is the two multiplied numerators and the denominator is the two multiplied denominators. For example, take the fraction one-third multiplied by one-fourth. $fraction_multiplication$ Putting these two characteristics together, we can see that the denominator of the correlation coefficient formula becomes the following. [ frac{sqrt{sum(x_{i} - bar{x}^2)}}{sqrt{n-1}} * frac{sqrt{sum(y_{i} - bar{y}^2)}}{sqrt{n-1}} = ] [ frac{sqrt{sum(x_{i} - bar{x}^2)} * sqrt{sum(y_{i} - bar{y}^2)}}{sqrt{n-1} * sqrt{n-1}} = ] [ frac{sqrt{sum(x_{i} - bar{x}^2)} * sqrt{sum(y_{i} - bar{y}^2)}}{n-1} ] When plugging this number back into the numerator, please remember that a fraction divided by a fraction is the same thing as a fraction multiplied by the inverse of that fraction. Taking the same example form above, this means that one-third divided by one fourth is the same thing as one-third multiplied by four over one. $fraction_division$ [ frac{sum_{i=1}^{n}(x_{i}- bar{x})(y_{i}- bar{y})}{n-1} * frac{n-1}{sqrt{sum(x_{i} - bar{x}^2)} * sqrt{sum(y_{i} - bar{y}^2)}} ]

Cancelling out the denominator and the numerator, as they are both $\text{[math]}$ , and simplifying both the numerator and denominator, we get: [ frac{n sum xy - sum x sum y}{n} * frac{n}{sqrt{ (n sum x^2 - (sum x)^2) ( n sum y^2 - (sum y)^2) }} ] [ frac{n sum xy - sum x sum y}{sqrt{ (n sum x^2 - (sum x)^2) ( n sum y^2 - (sum y)^2) }} ]

Interpretation of Correlation Coefficient

The interpretation of the correlation coefficient is quite simple and can be summarized by the table below.

Value	Direction	Strength	Interpretation
-1	Negative	Very Strong	Perfect negative correlation
-0.3	Negative	Weak	Very weak negative correlation
0	None	None	No correlation
0.3	Positive	Weak	Very weak positive correlation
1	Positive	Very strong	Perfect positive correlation

Step by Step Solution

The correlation is calculated below.

Observation	Happiness Score	Work Hours	$\text{[math]}$ $\text{[math]}$	$\text{[math]}$ $\text{[math]}$	$\text{[math]}$	$\text{[math]}$	$\text{[math]}$
1.0	89.0	30.0	21.3	-11.7	-248.9	455.1	136.1
2.0	90.0	35.0	22.3	-6.7	-148.9	498.8	44.4
3.0	54.0	40.0	-13.7	-1.7	22.8	186.8	2.8
4.0	60.0	35.0	-7.7	-6.7	51.1	58.8	44.4
5.0	73.0	40.0	5.3	-1.7	-8.9	28.4	2.8
6.0	40.0	70.0	-27.7	28.3	-783.9	765.4	802.8
Average	67.7	41.7		Total	-1116.7	1993.3	1033.3

Plugging this into the formula, we get: [ r_{xy} = frac{-1116.7}{sqrt{1993.3*1033.3}} = -0.78 ]

Did you like this article? Rate it!

4.00 (2 rating(s))

Emma

I am passionate about travelling and currently live and work in Paris. I like to spend my time reading, gardening, running, learning languages and exploring new places.

Solution to Problem of Regression 3

Problem 3

What is the Correlation Coefficient?

Derivation of Formula

Interpretation of Correlation Coefficient

Step by Step Solution

Central Limit Theorem

Linear Correlation Coefficient

Type I and Type II Errors

Hypothesis Testing

Linear Regression

Sampling

Solution to Problem of Regression 4

Solution to Problem of Regression 5

Solution to Problem of Regression 6

Solution to Problem of Regression 8