Point-Biserial Correlation#

The point-biserial correlation is a special case of the Pearson correlation that quantifies the linear association between a dichotomous (binary) variable and a continuous variable. It is commonly used when a natural grouping (e.g., gender, pass/fail) needs to be related to a metric outcome. Mathematically, it is identical to the Pearson correlation when the dichotomous variable is coded as 0 and 1.

When to Use#

One variable is dichotomous (exactly two categories, e.g., yes/no, male/female)
The other variable is metric (interval or ratio scaled)
You want to determine the strength and direction of the association between the two variables
Observations are independent of one another
As an alternative to the t-test when you want to report an association measure rather than a group difference

Assumptions#

Dichotomous variable with exactly two categories (0/1 coded)
Continuous variable approximately normally distributed in both groups
Independent observations (no repeated measures)
Homogeneity of variances across both groups (desirable)

Formula#

The point-biserial correlation can be calculated directly from group statistics:

r_{pb} = \frac{\bar{X}_1 - \bar{X}_0}{s_n} \cdot \sqrt{\frac{n_1 \cdot n_0}{n^2}}

Here, $\bar{X}_1$ is the mean of group 1, $\bar{X}_0$ is the mean of group 0, $s_n$ is the standard deviation of all values, $n_1$ and $n_0$ are the group sizes, and $n$ is the total sample size.

Alternatively, you can simply compute the Pearson correlation between the 0/1-coded dichotomous variable and the continuous variable — the result is identical:

r_{pb} = r_{XY} \quad \text{where } X \in \{0, 1\}

Example#

Practical Example: Gender and Reaction Time

A psychology experiment investigates whether there is an association between gender (male = 0, female = 1) and reaction time (in milliseconds).

Data (n = 10):

Male (0): 320, 345, 310, 298, 330 ms → $\bar{X}_0 = 320.6$ ms
Female (1): 290, 275, 305, 280, 295 ms → $\bar{X}_1 = 289.0$ ms

Calculation:

Overall mean: $\bar{X} = 304.8$ ms
Standard deviation: $s_n = 21.3$ ms
$n_0 = 5$ , $n_1 = 5$ , $n = 10$

r_{pb} = \frac{289.0 - 320.6}{21.3} \cdot \sqrt{\frac{5 \cdot 5}{100}} = \frac{-31.6}{21.3} \cdot 0.5 = -0.74

Interpretation: There is a strong negative association ( $r_{pb} = -0.74$ ). Female participants show shorter reaction times on average compared to male participants.

Effect Size#

The point-biserial correlation coefficient $r_{pb}$ is itself an effect size measure and lies on the same scale as Pearson's $r$ :

| $|r_{pb}|$ | Interpretation | |---|---| | 0.10 | Small effect | | 0.30 | Medium effect | | 0.50 | Large effect |

These conventions follow Cohen (1988). Additionally, the coefficient of determination $r_{pb}^2$ can be calculated to express the proportion of explained variance.

R^2 = r_{pb}^2