Statistical Power#
Statistical power is the probability of detecting a real effect as statistically significant. It is the complement of the Type II error (Ξ² error).
The Two Error Types#
| Hβ is true (no effect) | Hβ is false (effect exists) | |
|---|---|---|
| Reject Hβ | Type I error (Ξ±) | Correct decision (Power) |
| Retain Hβ | Correct decision | Type II error (Ξ²) |
- Type I error (Ξ±): Finding an effect that does not exist (false positive)
- Type II error (Ξ²): Missing an effect that actually exists (false negative)
- Power (1 - Ξ²): Correctly detecting a real effect
What Determines Power?#
Four factors determine the power of a test. If you know three of them, you can calculate the fourth:
1. Effect Size#
The larger the actual effect, the easier it is to detect.
2. Sample Size (n)#
Larger samples provide more power. This is the factor researchers can most easily control.
3. Significance Level (Ξ±)#
A stricter Ξ± (e.g., 0.01 instead of 0.05) reduces power because the detection threshold is higher.
4. Variability in the Data#
Less variability in the data increases power because effects stand out more clearly from the noise.
Interplay of factors
A researcher plans a t-test and wants to achieve a power of 0.80 (Ξ± = 0.05, two-tailed):
- For a large effect (d = 0.8): n β 26 per group
- For a medium effect (d = 0.5): n β 64 per group
- For a small effect (d = 0.2): n β 394 per group
The smaller the expected effect, the more participants are needed.
A Priori Power Analysis#
A priori power analysis is conducted before data collection to determine the required sample size.
Required inputs:
- Desired power (typically 0.80 or 0.90)
- Significance level (typically Ξ± = 0.05)
- Expected effect size (from pilot studies, literature, or theoretical considerations)
- Type of statistical test
Example: Power analysis for an ANOVA
A psychologist wants to compare three therapy approaches (one-way ANOVA). They expect a medium effect (f = 0.25) and want to achieve a power of 0.80 at Ξ± = 0.05.
Power analysis result: n β 53 per group, so 159 participants in total.
Post-Hoc Power Analysis#
Post-hoc power analysis is conducted after data collection. It calculates the power the study actually had.
Caution: Post-hoc power analyses based on the observed effect are methodologically controversial. Observed power is a direct function of the p-value and provides no additional information. They should only be conducted using a priori specified effect sizes.
Power Conventions#
| Power | Evaluation |
|---|---|
| < 0.50 | Insufficient |
| 0.50 β 0.79 | Moderate |
| 0.80 | Recommended minimum |
| 0.90 | Good |
| 0.95 | Very good |
The convention of power = 0.80 means: It is accepted that a real effect will be missed in 20% of cases.
Power and Study Design#
Power can be increased through various measures:
- Increase sample size β The most direct approach
- Use within-subjects design β Paired tests have more power than unpaired ones
- Reduce variance β Through standardized conditions or covariates
- Choose a larger Ξ± β But at the cost of Type I error rate
- Use a one-tailed test β Only with a justified directional hypothesis
- Use more reliable instruments β Less measurement error = less noise
Common Misconceptions#
"My study was not significant, so there is no effect." Without sufficient power, a study cannot detect real effects. A power analysis shows whether the study was even capable of finding the effect.
"80% power is always sufficient." In some contexts (e.g., clinical trials), 90% or more is appropriate. The choice depends on the consequences of missing an effect.
"Power analysis can be done after the study." Ideally, power analysis is conducted before the study. A post-hoc analysis using the observed effect is circular and uninformative.
Further Reading
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.
- Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE.