Post-Hoc Tests#
You ran an ANOVA and got a significant result — great. But the ANOVA only tells you that at least two groups differ. Which ones exactly? That is what post-hoc tests reveal. They systematically carry out pairwise comparisons while correcting the significance level so you do not produce false positives by chance.
Why Do We Need Post-Hoc Tests?#
Imagine you are comparing four teaching methods. The ANOVA yields F(3, 76) = 4.82, p = .004 — there are significant differences. But between which methods? There are 6 possible pairwise comparisons (A–B, A–C, A–D, B–C, B–D, C–D). If you simply ran six separate t-tests, your actual alpha level would rise well above 0.05. Post-hoc procedures solve exactly this problem.
The Most Important Procedures#
Tukey HSD (Honestly Significant Difference)#
The classic choice and often the best one. Tukey controls the familywise error rate and compares all groups pairwise.
When to use Tukey?
- Equal (or approximately equal) group sizes
- Homogeneity of variance is met (Levene's test not significant)
- You want to test all pairwise comparisons
Bonferroni Correction#
Simple and conservative: the significance level is divided by the number of comparisons. With 6 comparisons, alpha = 0.05/6 = 0.0083.
When to use Bonferroni?
- Few, pre-planned comparisons
- Unequal group sizes
- You need a procedure that is easy to explain
- Caution: with many comparisons, Bonferroni becomes very conservative (low power)
Scheffé Test#
The most conservative procedure but also the most flexible. Scheffé allows not only pairwise comparisons but also complex contrasts (e.g., Group A+B vs. C+D).
When to use Scheffé?
- You want to test complex contrasts, not just simple pairwise comparisons
- You did not formulate specific hypotheses in advance
- Note: for pure pairwise comparisons, Tukey usually has better power
Games-Howell#
The go-to option when variance homogeneity is violated. Games-Howell assumes neither equal variances nor equal group sizes.
When to use Games-Howell?
- Levene's test is significant (unequal variances)
- Group sizes differ
- Alternative to Tukey when its assumptions are violated
Dunn-Bonferroni (After Kruskal-Wallis)#
If you used the nonparametric Kruskal-Wallis test, you also need a nonparametric post-hoc procedure. Dunn's test with Bonferroni correction is the standard choice here.
Decision Guide: Which Procedure to Choose?#
| Situation | Recommended Procedure |
|---|---|
| Equal groups, homogeneous variances | Tukey HSD |
| Few planned comparisons | Bonferroni |
| Unequal variances | Games-Howell |
| Complex contrasts | Scheffé |
| After Kruskal-Wallis | Dunn-Bonferroni |
| Very many comparisons | Holm-Bonferroni (less conservative) |
Practical Example#
Four Teaching Methods Compared
A lecturer compares four teaching methods (Lecture, Flipped Classroom, Problem-Based, Self-Study) using exam scores (n = 20 per group).
- ANOVA: F(3, 76) = 4.82, p = .004 — significant
- Levene's test: p = .31 — homogeneity of variance holds
- Post-hoc (Tukey HSD):
- Lecture vs. Flipped Classroom: p = .42 (n.s.)
- Lecture vs. Problem-Based: p = .003 (significant)
- Lecture vs. Self-Study: p = .87 (n.s.)
- Flipped Classroom vs. Problem-Based: p = .09 (n.s.)
- Flipped Classroom vs. Self-Study: p = .21 (n.s.)
- Problem-Based vs. Self-Study: p = .01 (significant)
Result: Problem-based learning leads to significantly better exam scores than lecture and self-study.
Common Misconceptions#
- "I can just run multiple t-tests." — No. Without correction, the probability of false positives rises sharply. With 6 comparisons, the actual alpha is already about 0.26.
- "Post-hoc tests only after a significant ANOVA." — This is common practice, but some methodologists argue that post-hoc tests can also be informative after a non-significant ANOVA.
- "Tukey always works." — Tukey assumes approximately equal group sizes and homogeneity of variance. When these assumptions are violated, Games-Howell is the better choice.
- "More conservative tests are always better." — More conservative procedures (Bonferroni, Scheffé) reduce Type I error but increase Type II error. The choice should match your research question.
Reporting#
Post-hoc results are typically reported alongside the ANOVA:
A one-way ANOVA revealed significant differences between teaching methods, F(3, 76) = 4.82, p = .004, eta-squared = .16. Post-hoc comparisons (Tukey HSD) showed that problem-based learning yielded significantly better results than lecture (p = .003, d = 0.89) and self-study (p = .01, d = 0.74).
Further Reading
- Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). Sage. Chapter 12: Post-hoc procedures.
- Maxwell, S. E., Delaney, H. D., & Kelley, K. (2018). Designing Experiments and Analyzing Data (3rd ed.). Routledge.
- Toothaker, L. E. (1993). Multiple Comparison Procedures. Sage.