Pearson Correlation#
The Pearson correlation (also: Pearson product-moment correlation) measures the strength and direction of the linear relationship between two continuous variables. The correlation coefficient r ranges from -1 to +1.
When to Use#
Use the Pearson correlation when you want to:
- Quantify the linear relationship between two variables
- Both variables are metric (continuous)
- The data are approximately normally distributed
- You are interested in the direction and strength of the relationship
Assumptions#
- Both variables are measured on a metric scale
- Normal distribution of both variables (Shapiro-Wilk test)
- Linear relationship between the variables (check with scatterplot)
- No significant outliers
- Independence of observation pairs
Formula#
The Pearson correlation coefficient is calculated as:
The test statistic for significance testing:
with degrees of freedom.
Example#
Practical Example: Study Time and Exam Results
A lecturer wants to investigate whether there is a relationship between study time (in hours) and exam results (in points). She collects data from 50 students.
- Variable X: Study time in hours per week
- Variable Y: Exam result (0–100 points)
The Pearson correlation yields r = 0.72, p < 0.001. There is a strong positive linear relationship: The more hours studied, the higher the exam result tends to be.
Effect Size#
The correlation coefficient r is itself a measure of effect size. The coefficient of determination r² indicates the proportion of explained variance:
| Effect Size | |r| | |---|---| | Small | 0.10 | | Medium | 0.30 | | Large | 0.50 |
Important: Correlation does not imply causation. A high correlation coefficient says nothing about the cause-and-effect relationship.
Further Reading
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.
- Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE.