Post-Hoc Tests#

You ran an ANOVA and got a significant result — great. But the ANOVA only tells you that at least two groups differ. Which ones exactly? That is what post-hoc tests reveal. They systematically carry out pairwise comparisons while correcting the significance level so you do not produce false positives by chance.

Why Do We Need Post-Hoc Tests?#

Imagine you are comparing four teaching methods. The ANOVA yields F(3, 76) = 4.82, p = .004 — there are significant differences. But between which methods? There are 6 possible pairwise comparisons (A–B, A–C, A–D, B–C, B–D, C–D). If you simply ran six separate t-tests, your actual alpha level would rise well above 0.05. Post-hoc procedures solve exactly this problem.

The Most Important Procedures#

Tukey HSD (Honestly Significant Difference)#

The classic choice and often the best one. Tukey controls the familywise error rate and compares all groups pairwise.

When to use Tukey?

Equal (or approximately equal) group sizes
Homogeneity of variance is met (Levene's test not significant)
You want to test all pairwise comparisons

Bonferroni Correction#

Simple and conservative: the significance level is divided by the number of comparisons. With 6 comparisons, alpha = 0.05/6 = 0.0083.

When to use Bonferroni?

Few, pre-planned comparisons
Unequal group sizes
You need a procedure that is easy to explain
Caution: with many comparisons, Bonferroni becomes very conservative (low power)

Scheffé Test#

The most conservative procedure but also the most flexible. Scheffé allows not only pairwise comparisons but also complex contrasts (e.g., Group A+B vs. C+D).

When to use Scheffé?

You want to test complex contrasts, not just simple pairwise comparisons
You did not formulate specific hypotheses in advance
Note: for pure pairwise comparisons, Tukey usually has better power

Games-Howell#

The go-to option when variance homogeneity is violated. Games-Howell assumes neither equal variances nor equal group sizes.

When to use Games-Howell?

Levene's test is significant (unequal variances)
Group sizes differ
Alternative to Tukey when its assumptions are violated

Dunn-Bonferroni (After Kruskal-Wallis)#

If you used the nonparametric Kruskal-Wallis test, you also need a nonparametric post-hoc procedure. Dunn's test with Bonferroni correction is the standard choice here.

Decision Guide: Which Procedure to Choose?#

Situation	Recommended Procedure
Equal groups, homogeneous variances	Tukey HSD
Few planned comparisons	Bonferroni
Unequal variances	Games-Howell
Complex contrasts	Scheffé
After Kruskal-Wallis	Dunn-Bonferroni
Very many comparisons	Holm-Bonferroni (less conservative)

Practical Example#

Four Teaching Methods Compared

A lecturer compares four teaching methods (Lecture, Flipped Classroom, Problem-Based, Self-Study) using exam scores (n = 20 per group).

ANOVA: F(3, 76) = 4.82, p = .004 — significant
Levene's test: p = .31 — homogeneity of variance holds
Post-hoc (Tukey HSD):
- Lecture vs. Flipped Classroom: p = .42 (n.s.)
- Lecture vs. Problem-Based: p = .003 (significant)
- Lecture vs. Self-Study: p = .87 (n.s.)
- Flipped Classroom vs. Problem-Based: p = .09 (n.s.)
- Flipped Classroom vs. Self-Study: p = .21 (n.s.)
- Problem-Based vs. Self-Study: p = .01 (significant)

Result: Problem-based learning leads to significantly better exam scores than lecture and self-study.

Common Misconceptions#

"I can just run multiple t-tests." — No. Without correction, the probability of false positives rises sharply. With 6 comparisons, the actual alpha is already about 0.26.
"Post-hoc tests only after a significant ANOVA." — This is common practice, but some methodologists argue that post-hoc tests can also be informative after a non-significant ANOVA.
"Tukey always works." — Tukey assumes approximately equal group sizes and homogeneity of variance. When these assumptions are violated, Games-Howell is the better choice.
"More conservative tests are always better." — More conservative procedures (Bonferroni, Scheffé) reduce Type I error but increase Type II error. The choice should match your research question.

Reporting#

Post-hoc results are typically reported alongside the ANOVA:

A one-way ANOVA revealed significant differences between teaching methods, F(3, 76) = 4.82, p = .004, eta-squared = .16. Post-hoc comparisons (Tukey HSD) showed that problem-based learning yielded significantly better results than lecture (p = .003, d = 0.89) and self-study (p = .01, d = 0.74).

Post-Hoc Tests