Statistical Power#

Statistical power is the probability of detecting a real effect as statistically significant. It is the complement of the Type II error (β error).

\text{Power} = 1 - \beta

The Two Error Types#

	H₀ is true (no effect)	H₀ is false (effect exists)
Reject H₀	Type I error (α)	Correct decision (Power)
Retain H₀	Correct decision	Type II error (β)

Type I error (α): Finding an effect that does not exist (false positive)
Type II error (β): Missing an effect that actually exists (false negative)
Power (1 - β): Correctly detecting a real effect

What Determines Power?#

Four factors determine the power of a test. If you know three of them, you can calculate the fourth:

1. Effect Size#

The larger the actual effect, the easier it is to detect.

2. Sample Size (n)#

Larger samples provide more power. This is the factor researchers can most easily control.

3. Significance Level (α)#

A stricter α (e.g., 0.01 instead of 0.05) reduces power because the detection threshold is higher.

4. Variability in the Data#

Less variability in the data increases power because effects stand out more clearly from the noise.

Interplay of factors

A researcher plans a t-test and wants to achieve a power of 0.80 (α = 0.05, two-tailed):

For a large effect (d = 0.8): n ≈ 26 per group
For a medium effect (d = 0.5): n ≈ 64 per group
For a small effect (d = 0.2): n ≈ 394 per group

The smaller the expected effect, the more participants are needed.

A Priori Power Analysis#

A priori power analysis is conducted before data collection to determine the required sample size.

Required inputs:

Desired power (typically 0.80 or 0.90)
Significance level (typically α = 0.05)
Expected effect size (from pilot studies, literature, or theoretical considerations)
Type of statistical test

Example: Power analysis for an ANOVA

A psychologist wants to compare three therapy approaches (one-way ANOVA). They expect a medium effect (f = 0.25) and want to achieve a power of 0.80 at α = 0.05.

Power analysis result: n ≈ 53 per group, so 159 participants in total.

Post-Hoc Power Analysis#

Post-hoc power analysis is conducted after data collection. It calculates the power the study actually had.

Caution: Post-hoc power analyses based on the observed effect are methodologically controversial. Observed power is a direct function of the p-value and provides no additional information. They should only be conducted using a priori specified effect sizes.

Power Conventions#

Power	Evaluation
< 0.50	Insufficient
0.50 – 0.79	Moderate
0.80	Recommended minimum
0.90	Good
0.95	Very good

The convention of power = 0.80 means: It is accepted that a real effect will be missed in 20% of cases.

Power and Study Design#

Power can be increased through various measures:

Increase sample size — The most direct approach
Use within-subjects design — Paired tests have more power than unpaired ones
Reduce variance — Through standardized conditions or covariates
Choose a larger α — But at the cost of Type I error rate
Use a one-tailed test — Only with a justified directional hypothesis
Use more reliable instruments — Less measurement error = less noise

Common Misconceptions#

"My study was not significant, so there is no effect." Without sufficient power, a study cannot detect real effects. A power analysis shows whether the study was even capable of finding the effect.

"80% power is always sufficient." In some contexts (e.g., clinical trials), 90% or more is appropriate. The choice depends on the consequences of missing an effect.

"Power analysis can be done after the study." Ideally, power analysis is conducted before the study. A post-hoc analysis using the observed effect is circular and uninformative.

Statistical Power