Welch's t-Test#

Welch's t-test (also known as the unequal variances t-test) is a variant of the independent samples t-test that does not assume equal variances. It uses the Welch-Satterthwaite approximation to adjust the degrees of freedom and delivers reliable results even with unequal variances and unequal sample sizes. Many statisticians now recommend it as the default procedure instead of the classic Student's t-test.

When to Use#

You are comparing the means of two independent groups
The variances in the two groups are not equal (Levene's test significant) or you do not want to make this assumption
The sample sizes are unequal — this is precisely where Welch's test outperforms the classic t-test
You want a robust test that loses only minimal power even when variances happen to be equal
Data are approximately normally distributed but variances are heteroscedastic

Assumptions#

Normal distribution in both groups (Shapiro-Wilk test, QQ plot)
Independence of observations (no repeated measures)
Continuous (interval or ratio scaled) dependent variable
Homogeneity of variances is NOT required

Note: The key advantage of Welch's t-test over the classic t-test is that it does not require equal variances. When variances are equal, it yields nearly identical results to the classic t-test (only marginally more conservative). Therefore, Delacre et al. (2017) recommend using Welch's test as the default approach.

Formula#

The test statistic of Welch's t-test:

t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}

Where $\bar{X}_1$ and $\bar{X}_2$ are the group means, $s_1^2$ and $s_2^2$ are the group variances, and $n_1$ and $n_2$ are the sample sizes.

The degrees of freedom are calculated using the Welch-Satterthwaite approximation:

df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{\left(\frac{s_1^2}{n_1}\right)^2}{n_1 - 1} + \frac{\left(\frac{s_2^2}{n_2}\right)^2}{n_2 - 1}}

These degrees of freedom are typically not whole numbers and fall between $\min(n_1, n_2) - 1$ and $n_1 + n_2 - 2$ .

Example#

Practical Example: Salary Comparison Between Departments

An HR analyst compares salaries in the Marketing department (n = 45) and the IT department (n = 120). The sample sizes and variances differ considerably.

Marketing: $\bar{X}_1 = 52\,400$ , $s_1 = 8\,200$
IT: $\bar{X}_2 = 58\,600$ , $s_2 = 14\,500$

Levene's test is significant ( $p = .003$ ), confirming unequal variances. The classic t-test would be inappropriate here.

Welch's t-test:

t = \frac{52\,400 - 58\,600}{\sqrt{\frac{8200^2}{45} + \frac{14500^2}{120}}} = \frac{-6\,200}{1\,698} = -3.65

$df = 127.4$ (Welch-Satterthwaite)
$p < .001$ (two-tailed)
$d = -0.56$ (medium effect)

The IT department earns significantly more than the Marketing department, even after accounting for unequal variances and sample sizes.

Effect Size#

The effect size is calculated with Cohen's d, the same as for the classic t-test:

d = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{(n_1 - 1) \cdot s_1^2 + (n_2 - 1) \cdot s_2^2}{n_1 + n_2 - 2}}}

Effect Size	\|d\|
Small	0.20
Medium	0.50
Large	0.80

When variances differ substantially, Glass's delta ( $\Delta$ ) can be used as an alternative, which uses only the standard deviation of the control group in the denominator.