Partial Correlation#

Partial correlation measures the linear association between two variables $X$ and $Y$ after statistically removing the influence of one or more control variables $Z$ . It is an essential tool for uncovering spurious correlations and isolating the "true" relationship between two variables. When a third variable influences both $X$ and $Y$ , the simple Pearson correlation can paint a misleading picture — partial correlation adjusts for this confounding effect.

When to Use#

You suspect the association between $X$ and $Y$ may be distorted by a third variable $Z$
You want to check whether an observed correlation is a spurious correlation
You want to report the adjusted association between two variables
All variables are metric (interval or ratio scaled)
You want to control for a confounding variable without running a full regression analysis

Assumptions#

Metric scale level for all variables (X, Y, and Z)
Linear relationship between all variable pairs
Approximate normality of all variables
Independent observations
No perfect multicollinearity between variables

Formula#

The first-order partial correlation (controlling for one variable $Z$ ) is computed from the three bivariate Pearson correlations:

r_{XY \cdot Z} = \frac{r_{XY} - r_{XZ} \cdot r_{YZ}}{\sqrt{(1 - r_{XZ}^2)(1 - r_{YZ}^2)}}

Here, $r_{XY}$ is the correlation between $X$ and $Y$ , $r_{XZ}$ is the correlation between $X$ and $Z$ , and $r_{YZ}$ is the correlation between $Y$ and $Z$ .

Significance is tested using a t-test:

t = \frac{r_{XY \cdot Z} \cdot \sqrt{n - 3}}{\sqrt{1 - r_{XY \cdot Z}^2}}, \quad df = n - 3

When controlling for $k$ variables, $df = n - 2 - k$ .

Example#

Practical Example: Ice Cream Sales, Drowning, and Temperature

A study finds a high positive correlation between ice cream sales ( $X$ ) and the number of drowning incidents ( $Y$ ) at public pools. Does eating more ice cream increase the risk of drowning? Of course not — temperature ( $Z$ ) is the common cause driving both variables.

Correlations (n = 50 summer days):

$r_{XY} = 0.83$ (ice cream sales ↔ drowning incidents)
$r_{XZ} = 0.90$ (ice cream sales ↔ temperature)
$r_{YZ} = 0.88$ (drowning incidents ↔ temperature)

Calculating the partial correlation:

r_{XY \cdot Z} = \frac{0.83 - 0.90 \cdot 0.88}{\sqrt{(1 - 0.90^2)(1 - 0.88^2)}} = \frac{0.83 - 0.792}{\sqrt{0.19 \cdot 0.2256}} = \frac{0.038}{0.207} = 0.18

Interpretation: The originally strong correlation of $r = 0.83$ drops to $r_{XY \cdot Z} = 0.18$ after controlling for temperature — a small and likely non-significant association. The observed correlation was largely a spurious correlation caused by the shared confounding variable temperature.

Effect Size#

The partial correlation $r_{XY \cdot Z}$ is itself an effect size measure and is interpreted using the same conventions as Pearson's $r$ :

| $|r_{XY \cdot Z}|$ | Interpretation | |---|---| | 0.10 | Small effect | | 0.30 | Medium effect | | 0.50 | Large effect |

Additionally, the proportion of uniquely explained variance can be calculated:

R^2_{\text{partial}} = r_{XY \cdot Z}^2

This value indicates how much variance in $Y$ is explained by $X$ after the influence of $Z$ has already been accounted for. In the example above, ice cream sales explain only $0.18^2 = 3.2\%$ of the variance in drowning incidents after controlling for temperature.