Pearson Correlation#

The Pearson correlation (also: Pearson product-moment correlation) measures the strength and direction of the linear relationship between two continuous variables. The correlation coefficient r ranges from -1 to +1.

When to Use#

Use the Pearson correlation when you want to:

Quantify the linear relationship between two variables
Both variables are metric (continuous)
The data are approximately normally distributed
You are interested in the direction and strength of the relationship

Assumptions#

Both variables are measured on a metric scale
Normal distribution of both variables (Shapiro-Wilk test)
Linear relationship between the variables (check with scatterplot)
No significant outliers
Independence of observation pairs

Formula#

The Pearson correlation coefficient is calculated as:

r = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2 \cdot \sum_{i=1}^{n}(y_i - \bar{y})^2}}

The test statistic for significance testing:

t = \frac{r \sqrt{n-2}}{\sqrt{1-r^2}}

with $n - 2$ degrees of freedom.

Example#

Practical Example: Study Time and Exam Results

A lecturer wants to investigate whether there is a relationship between study time (in hours) and exam results (in points). She collects data from 50 students.

Variable X: Study time in hours per week
Variable Y: Exam result (0–100 points)

The Pearson correlation yields r = 0.72, p < 0.001. There is a strong positive linear relationship: The more hours studied, the higher the exam result tends to be.

Effect Size#

The correlation coefficient r is itself a measure of effect size. The coefficient of determination r² indicates the proportion of explained variance:

r^2 = \text{proportion of explained variance}

| Effect Size | |r| | |---|---| | Small | 0.10 | | Medium | 0.30 | | Large | 0.50 |

Important: Correlation does not imply causation. A high correlation coefficient says nothing about the cause-and-effect relationship.