PickMyTest

Point-Biserial Correlation

Measures the association between a dichotomous (binary) variable and a continuous variable. Mathematically equivalent to Pearson correlation with 0/1 coding.

Point-Biserial Correlation#

The point-biserial correlation is a special case of the Pearson correlation that quantifies the linear association between a dichotomous (binary) variable and a continuous variable. It is commonly used when a natural grouping (e.g., gender, pass/fail) needs to be related to a metric outcome. Mathematically, it is identical to the Pearson correlation when the dichotomous variable is coded as 0 and 1.

When to Use#

  • One variable is dichotomous (exactly two categories, e.g., yes/no, male/female)
  • The other variable is metric (interval or ratio scaled)
  • You want to determine the strength and direction of the association between the two variables
  • Observations are independent of one another
  • As an alternative to the t-test when you want to report an association measure rather than a group difference

Assumptions#

  • Dichotomous variable with exactly two categories (0/1 coded)
  • Continuous variable approximately normally distributed in both groups
  • Independent observations (no repeated measures)
  • Homogeneity of variances across both groups (desirable)

Formula#

The point-biserial correlation can be calculated directly from group statistics:

rpb=XΛ‰1βˆ’XΛ‰0snβ‹…n1β‹…n0n2r_{pb} = \frac{\bar{X}_1 - \bar{X}_0}{s_n} \cdot \sqrt{\frac{n_1 \cdot n_0}{n^2}}

Here, Xˉ1\bar{X}_1 is the mean of group 1, Xˉ0\bar{X}_0 is the mean of group 0, sns_n is the standard deviation of all values, n1n_1 and n0n_0 are the group sizes, and nn is the total sample size.

Alternatively, you can simply compute the Pearson correlation between the 0/1-coded dichotomous variable and the continuous variable β€” the result is identical:

rpb=rXYwhere X∈{0,1}r_{pb} = r_{XY} \quad \text{where } X \in \{0, 1\}

Example#

Practical Example: Gender and Reaction Time

A psychology experiment investigates whether there is an association between gender (male = 0, female = 1) and reaction time (in milliseconds).

Data (n = 10):

  • Male (0): 320, 345, 310, 298, 330 ms β†’ XΛ‰0=320.6\bar{X}_0 = 320.6 ms
  • Female (1): 290, 275, 305, 280, 295 ms β†’ XΛ‰1=289.0\bar{X}_1 = 289.0 ms

Calculation:

  • Overall mean: XΛ‰=304.8\bar{X} = 304.8 ms
  • Standard deviation: sn=21.3s_n = 21.3 ms
  • n0=5n_0 = 5, n1=5n_1 = 5, n=10n = 10
rpb=289.0βˆ’320.621.3β‹…5β‹…5100=βˆ’31.621.3β‹…0.5=βˆ’0.74r_{pb} = \frac{289.0 - 320.6}{21.3} \cdot \sqrt{\frac{5 \cdot 5}{100}} = \frac{-31.6}{21.3} \cdot 0.5 = -0.74

Interpretation: There is a strong negative association (rpb=βˆ’0.74r_{pb} = -0.74). Female participants show shorter reaction times on average compared to male participants.

Effect Size#

The point-biserial correlation coefficient rpbr_{pb} is itself an effect size measure and lies on the same scale as Pearson's rr:

| ∣rpb∣|r_{pb}| | Interpretation | |---|---| | 0.10 | Small effect | | 0.30 | Medium effect | | 0.50 | Large effect |

These conventions follow Cohen (1988). Additionally, the coefficient of determination rpb2r_{pb}^2 can be calculated to express the proportion of explained variance.

R2=rpb2R^2 = r_{pb}^2

Further Reading

  • Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.
  • Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE Publications.
  • Glass, G. V. & Hopkins, K. D. (1996). Statistical Methods in Education and Psychology (3rd ed.). Allyn & Bacon.