Alpha Correction#

Every time you run a statistical test, you take a 5% risk of falsely finding a significant result (at alpha = 0.05). That sounds acceptable — for a single test. But what happens when you run 10, 20, or 50 tests at the same time? Those risks add up, and suddenly the probability of at least one false alarm is alarmingly high. This is exactly the problem that alpha correction addresses.

The Multiple Comparisons Problem#

Worked Example: Alpha Inflation

You run 10 independent tests with alpha = 0.05. The probability of not making a Type I error on a single test is 0.95.

The probability of making no error across all 10 tests:

P(\text{no error}) = (1 - 0.05)^{10} = 0.95^{10} = 0.60

So the probability of at least one Type I error is:

P(\text{at least 1 error}) = 1 - 0.60 = 0.40

Instead of a 5% error risk, you now face 40%! With 20 tests it would be 64%.

This growing error risk is called the familywise error rate (FWER) — the probability of committing at least one Type I error within a family of tests.

Correction Methods#

Bonferroni Correction#

The simplest and best-known method. You divide your alpha level by the number of tests:

\alpha_{\text{corr}} = \frac{\alpha}{m}

With 10 tests and alpha = 0.05, the corrected threshold becomes: 0.05/10 = 0.005. A result is only significant if p < 0.005.

Pros and Cons

Pros:

Easy to compute and explain
Works regardless of test dependencies
Widely recognized and accepted

Cons:

Very conservative, especially with many comparisons
Low statistical power — real effects are easily missed

Holm-Bonferroni (Step-Down)#

An improved version of the Bonferroni correction that is less conservative while still controlling the FWER.

How it works:

Sort all p-values from smallest to largest
Compare the smallest p-value with alpha/m
If significant, compare the second smallest with alpha/(m-1)
Continue until a p-value fails to reach significance
All remaining p-values are declared non-significant

Example: Holm Correction with 4 Comparisons

Four p-values (sorted): 0.003, 0.012, 0.030, 0.180

Rank	p-value	Threshold (alpha/(m-rank+1))	Significant?
1	0.003	0.05/4 = 0.0125	Yes
2	0.012	0.05/3 = 0.0167	Yes
3	0.030	0.05/2 = 0.025	No (Stop!)
4	0.180	0.05/1 = 0.05	No

Result: The first two comparisons remain significant, the third does not (even though p = .030 < .05).

Benjamini-Hochberg (FDR Correction)#

This approach does not control the FWER but instead the False Discovery Rate (FDR) — the expected proportion of false discoveries among all significant results. This sounds more lenient, but in many situations it is the more sensible strategy.

How it works:

Sort all p-values from smallest to largest
Compare the largest p-value with alpha
The second largest with alpha x (m-1)/m
In general: p(i) with alpha x i/m
The largest p-value meeting its criterion, and all smaller ones, are declared significant

When FDR Instead of FWER?

FWER (Bonferroni, Holm): When even a single false positive has serious consequences (e.g., clinical trials, genomic studies with follow-up experiments)
FDR (Benjamini-Hochberg): When you are running an exploratory analysis and a certain proportion of false discoveries is acceptable (e.g., exploratory gene expression studies, screening studies)

Comparison of Methods#

Method	Controls	Conservativeness	Best Application
Bonferroni	FWER	Very conservative	Few comparisons, simple presentation
Holm-Bonferroni	FWER	Moderately conservative	Standard method, almost always better than Bonferroni
Benjamini-Hochberg	FDR	Liberal	Exploratory analyses, many tests
No correction	—	—	Only with a single pre-planned test

When Is Correction Necessary?#

Not every situation requires an alpha correction. Here is some guidance:

Correction recommended:

Multiple post-hoc comparisons after an ANOVA
Many correlations in a correlation matrix
Subgroup analyses without pre-specified hypotheses

Correction usually not needed:

A single pre-planned comparison (primary endpoint)
Orthogonal contrasts (they are independent of each other)
Confirmatory study with one primary test

Common Misconceptions#

"Bonferroni is always the right choice." — Holm-Bonferroni controls the FWER just as well but has more power. There is rarely a reason to prefer the simple Bonferroni correction.
"Without correction all my results are invalid." — Alpha correction concerns the overall error rate. Individual tests with p = .001 are hardly due to chance even without correction.
"FDR correction is not rigorous." — On the contrary: for exploratory analyses with many tests, FDR correction is often the more methodologically sound choice because it preserves more power.
"I will just test fewer hypotheses so I do not need correction." — That is actually a legitimate strategy. Few, pre-planned comparisons reduce the problem.

Practical Tips#

Plan ahead: Define before data collection which comparisons you will make
Use Holm instead of Bonferroni: Same FWER control, more power
Report transparently: State how many tests were performed and which correction was used
Look at effect sizes: A significant p-value after correction says nothing about practical relevance

Alpha Correction