How to calcualte pearson correlation

424 Shares

Posted by: admin 1 year, 4 months ago

Analysis of correlation and significance of parameters

Correlation

The study of the significance of the impact of input parameters on output parameters should begin with the analysis of the correlation of individual parameters. Three basic dependencies can be checked:

monotonic linear
monotonic non-linear
square

Pearson's correlation coefficient (monotonic linear relationship)

The most basic measure determining whether there is a linear correlation between parameters $x_{i}$ i $y_{i}$ is the Pearson correlation coefficient:

r_{p} = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} (x_{i} - \bar{x})^{2}} \sqrt{\sum_{i = 1}^{n} (y_{i} - \bar{y})^{2}}}

where $\bar{x}$ and $\bar{y}$ mean the mean values of the relevant parameters.

This formula can be simplified to

r_{p} = \frac{c o v (x, y)}{\sqrt{v a r (x) v a r (y)}}

where $x = [x_{1}, x_{2}, . . .], y = [y_{1}, y_{2}, . . .]$

Spearman's correlation coefficient (monotonic non-linear relationship)

Spearman's rank correlation coefficient is more universal because it allows to determine the strength of monotonic correlation, which may be non-linear and is expressed by the relation:

r_{s} = \frac{\sum_{i = 1}^{n} (R_{i} - \bar{R}) (S_{i} - \bar{S})}{\sqrt{\sum_{i = 1}^{n} (R_{i} - \bar{R})^{2}} \sqrt{\sum_{i = 1}^{n} (S_{i} - \bar{S})^{2}}}

where $R_{i}$ is the rank of the observation $x_{i}$ , $S_{i}$ is the rank of the observation $y_{i}$ and $\bar{R}$ i $\bar{S}$ are the mean values of the respective ranks $R_{i}$ and $S_{i}$ .

Interpretation of the correlation coefficient value

Correlation type:

$r_{s}$ > 0 positive correlation – when the value of X increases, so does Y
$r_{s}$ = 0 no correlation – when X increases, Y sometimes increases and sometimes decreases
$r_{s}$ < 0 negative correlation – when X increases, Y decreases

Correlation strength:

$| r_{s} | < 0.2$ – no linear relationship
$0.2 \leq | r_{s} | < 0.4$ - weak dependence
$0.4 \leq | r_{s} | < 0.7$ – moderate dependency
$0.7 \leq | r_{s} | < 0.9$ - quite a strong relationship
$| r_{s} | \geq 0.9$ - very strong dependence

Quadratic correlation coefficient

The quadratic correlation coefficient is determined on the basis of regression analysis.

Error sum of squares $S S E$ is designated as

S S E = \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}

After performing the approximation with a polynomial of the second degree (i.e. determining the coefficients $a_{2}, a_{1}, a_{0}$ ) ${\hat{y}}_{i}$ is determined by substitution $x_{i}$ to the formula of the approximating function

{\hat{y}}_{i} = a_{2} {x_{i}}^{2} + a_{1} x_{i} + a_{0}

total sum of squares $S S T$ to

S S T = \sum_{i = 1}^{n} (y_{i} - \bar{y})^{2}

The correlation coefficient is determined from the relationship

r_{q} = \sqrt{1 - \frac{S S E}{S S T}}

Statistical testing of the significance of the correlation coefficient

To determine whether the determined correlation coefficient is statistically significant, it is necessary to make a null hypothesis

H_{0} : δ = 0

meaning that there is no correlation between the parameters. The alternative hypothesis has the form

H_{1} : δ \neq 0

It is assumed that the statistic takes the Student's t-distribution o $k = n - 2$ degrees of freedom and hence, for example, for the Pearson correlation coefficient, the value of the statistics is

t = r_{p} \sqrt{\frac{n - 2}{1 - r_{p}^{2}}}

The value of the test statistic cannot be determined when $r_{p} = 1$ the $r_{p} = - 1$ or when $n < 3$ .

In other cases, the value determined on its basis $p$ (read from the Student's t-distribution) is compared with the assumed significance level $α$

if $p \leq α$ we reject it $H_{0}$ accepting $H_{1}$
if $p > α$ there is no reason to reject it $H_{0}$

Typically, a significance level is selected $α = 0.05$ , agreeing that in 5% of situations we will reject the null hypothesis when it is true.

The same is done for the other correlation coefficients instead $r_{p}$ substituting $r_{s}$ the $r_{q}$ .

Comments

Tech Transport Mobile Gadgets

424 Shares 4 Comments

Latest posts

How to create output gap with Python and Anaconda

Recent news

Dignity wrapped in Charity

Recent news

A reflection of using kanban flow and being minimalist

Recent news

Today is the consecutive day I want to use and be consistent with the Kanban flow! It seems it's perfect to limit my parallel and easily distractedness.

Morning issue with car and my kind of music

Recent news

Podcast Bapak Dimas 2 - pindahan rumah

Recent news

Vlog kali ini adalah terkait pindahan rumah!

Podcast Bapak Dimas - Bapaknya Jozio dan Kaziu - ep 1

Recent news

Seperti yang saya cerita kan sebelumnya, berikut adalah catatan pribadi VLOG kita! Bapak Dimas

Happy new year 2024 and thank you 2023!

Recent news

As the new year starts, I want to revisit what has happened in 2023.

Some notes about python and Zen of Python

Recent news

Explore Python syntax

Python is a flexible programming language used in a wide range of fields, including software development, machine learning, and data analysis. Python is one of the most popular programming languages for data professionals, so getting familiar with its fundamental syntax and semantics will be useful for your future career. In this reading, you will learn about Python’s syntax and semantics, as well as where to find resources to further your learning.

4 months, 3 weeks ago

More News »

How to calcualte pearson correlation

Posted by: admin 1 year, 4 months ago

Analysis of correlation and significance of parameters

Correlation

Pearson's correlation coefficient (monotonic linear relationship)

Spearman's correlation coefficient (monotonic non-linear relationship)

Interpretation of the correlation coefficient value

Quadratic correlation coefficient

Statistical testing of the significance of the correlation coefficient

Comments

Riddles

Knows deeper about Cyclical Risk

Latest posts

How to create output gap with Python and Anaconda

1 month, 1 week ago

Dignity wrapped in Charity

2 months, 3 weeks ago

A reflection of using kanban flow and being minimalist

3 months ago

Morning issue with car and my kind of music

3 months, 1 week ago

Podcast Bapak Dimas 2 - pindahan rumah

3 months, 1 week ago

Podcast Bapak Dimas - Bapaknya Jozio dan Kaziu - ep 1

3 months, 1 week ago

Happy new year 2024 and thank you 2023!

3 months, 1 week ago

Some notes about python and Zen of Python

Explore Python syntax

4 months, 3 weeks ago

Latest comments

Editor Corner