PickMyTest

Post-Hoc Tests

Which post-hoc tests to use after a significant ANOVA and why

Post-Hoc Tests#

You ran an ANOVA and got a significant result — great. But the ANOVA only tells you that at least two groups differ. Which ones exactly? That is what post-hoc tests reveal. They systematically carry out pairwise comparisons while correcting the significance level so you do not produce false positives by chance.

Why Do We Need Post-Hoc Tests?#

Imagine you are comparing four teaching methods. The ANOVA yields F(3, 76) = 4.82, p = .004 — there are significant differences. But between which methods? There are 6 possible pairwise comparisons (A–B, A–C, A–D, B–C, B–D, C–D). If you simply ran six separate t-tests, your actual alpha level would rise well above 0.05. Post-hoc procedures solve exactly this problem.

The Most Important Procedures#

Tukey HSD (Honestly Significant Difference)#

The classic choice and often the best one. Tukey controls the familywise error rate and compares all groups pairwise.

When to use Tukey?

  • Equal (or approximately equal) group sizes
  • Homogeneity of variance is met (Levene's test not significant)
  • You want to test all pairwise comparisons

Bonferroni Correction#

Simple and conservative: the significance level is divided by the number of comparisons. With 6 comparisons, alpha = 0.05/6 = 0.0083.

When to use Bonferroni?

  • Few, pre-planned comparisons
  • Unequal group sizes
  • You need a procedure that is easy to explain
  • Caution: with many comparisons, Bonferroni becomes very conservative (low power)

Scheffé Test#

The most conservative procedure but also the most flexible. Scheffé allows not only pairwise comparisons but also complex contrasts (e.g., Group A+B vs. C+D).

When to use Scheffé?

  • You want to test complex contrasts, not just simple pairwise comparisons
  • You did not formulate specific hypotheses in advance
  • Note: for pure pairwise comparisons, Tukey usually has better power

Games-Howell#

The go-to option when variance homogeneity is violated. Games-Howell assumes neither equal variances nor equal group sizes.

When to use Games-Howell?

  • Levene's test is significant (unequal variances)
  • Group sizes differ
  • Alternative to Tukey when its assumptions are violated

Dunn-Bonferroni (After Kruskal-Wallis)#

If you used the nonparametric Kruskal-Wallis test, you also need a nonparametric post-hoc procedure. Dunn's test with Bonferroni correction is the standard choice here.

Decision Guide: Which Procedure to Choose?#

SituationRecommended Procedure
Equal groups, homogeneous variancesTukey HSD
Few planned comparisonsBonferroni
Unequal variancesGames-Howell
Complex contrastsScheffé
After Kruskal-WallisDunn-Bonferroni
Very many comparisonsHolm-Bonferroni (less conservative)

Practical Example#

Four Teaching Methods Compared

A lecturer compares four teaching methods (Lecture, Flipped Classroom, Problem-Based, Self-Study) using exam scores (n = 20 per group).

  1. ANOVA: F(3, 76) = 4.82, p = .004 — significant
  2. Levene's test: p = .31 — homogeneity of variance holds
  3. Post-hoc (Tukey HSD):
    • Lecture vs. Flipped Classroom: p = .42 (n.s.)
    • Lecture vs. Problem-Based: p = .003 (significant)
    • Lecture vs. Self-Study: p = .87 (n.s.)
    • Flipped Classroom vs. Problem-Based: p = .09 (n.s.)
    • Flipped Classroom vs. Self-Study: p = .21 (n.s.)
    • Problem-Based vs. Self-Study: p = .01 (significant)

Result: Problem-based learning leads to significantly better exam scores than lecture and self-study.

Common Misconceptions#

  • "I can just run multiple t-tests." — No. Without correction, the probability of false positives rises sharply. With 6 comparisons, the actual alpha is already about 0.26.
  • "Post-hoc tests only after a significant ANOVA." — This is common practice, but some methodologists argue that post-hoc tests can also be informative after a non-significant ANOVA.
  • "Tukey always works." — Tukey assumes approximately equal group sizes and homogeneity of variance. When these assumptions are violated, Games-Howell is the better choice.
  • "More conservative tests are always better." — More conservative procedures (Bonferroni, Scheffé) reduce Type I error but increase Type II error. The choice should match your research question.

Reporting#

Post-hoc results are typically reported alongside the ANOVA:

A one-way ANOVA revealed significant differences between teaching methods, F(3, 76) = 4.82, p = .004, eta-squared = .16. Post-hoc comparisons (Tukey HSD) showed that problem-based learning yielded significantly better results than lecture (p = .003, d = 0.89) and self-study (p = .01, d = 0.74).

Further Reading

  • Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). Sage. Chapter 12: Post-hoc procedures.
  • Maxwell, S. E., Delaney, H. D., & Kelley, K. (2018). Designing Experiments and Analyzing Data (3rd ed.). Routledge.
  • Toothaker, L. E. (1993). Multiple Comparison Procedures. Sage.