Effect Sizes#
Effect size quantifies the magnitude of an effect regardless of sample size. While the p-value only indicates whether an effect is statistically significant, the effect size tells you how large the effect is.
Why Effect Sizes Matter#
A statistically significant result can be practically meaningless. Conversely, a non-significant result may reflect a substantial effect that went undetected due to a small sample size.
Why p-values alone are not enough
Two studies examine the effect of a medication:
- Study A (n = 20): Mean difference = 8 points, p = 0.12, d = 0.72
- Study B (n = 5,000): Mean difference = 0.3 points, p < 0.001, d = 0.04
Study B is significant, but the effect is tiny. Study A shows a substantial effect that did not reach significance because of the small sample.
Key Effect Size Measures#
Cohen's d β For Mean Comparisons#
Cohen's d measures the difference between two means in units of standard deviation.
For independent samples:
with the pooled standard deviation:
For paired samples (Cohen's d_z):
| Interpretation | Cohen's d |
|---|---|
| Small | 0.2 |
| Medium | 0.5 |
| Large | 0.8 |
Eta-Squared (Ξ·Β²) β For ANOVA#
Eta-squared indicates the proportion of variance explained by the effect relative to total variance.
| Interpretation | Ξ·Β² |
|---|---|
| Small | 0.01 |
| Medium | 0.06 |
| Large | 0.14 |
Note: Ξ·Β² systematically overestimates the population effect size. Partial Ξ·Β² or omega-squared (ΟΒ²) are preferred.
Partial Eta-Squared (Ξ·Β²_p)#
In the context of factorial ANOVAs, partial Ξ·Β² considers only the relevant error variance:
Omega-Squared (ΟΒ²) β Unbiased Estimator#
Correlation Coefficient r#
Pearson's r is itself an effect size measure for the linear relationship between two variables.
| Interpretation | |r| | |---|---| | Small | 0.10 | | Medium | 0.30 | | Large | 0.50 |
Converting Between Effect Size Measures#
Cramer's V β For Categorical Data#
Cramer's V is the effect size measure for the chi-square test:
where k is the minimum of the number of rows and columns.
Reporting Effect Sizes#
The APA (American Psychological Association) recommends always reporting effect sizes and confidence intervals.
Correct APA-style reporting
"The independent t-test showed a significant difference between the experimental group (M = 24.3, SD = 4.1) and the control group (M = 20.1, SD = 3.8), t(38) = 3.28, p = .002, d = 1.06, 95% CI [0.38, 1.73]."
Practical Interpretation#
Cohen's benchmarks (small, medium, large) are general guidelines. Practical significance depends on context:
- In medicine, a "small" effect (d = 0.2) can save thousands of lives
- In educational research, a "medium" effect (d = 0.5) is already noteworthy
- In basic research, large effects are not uncommon
Common Misconceptions#
"A significant p-value means a large effect." No. Significance and effect size are independent concepts. Significance depends heavily on sample size.
"Cohen's benchmarks are universal." The benchmarks (0.2 / 0.5 / 0.8) are conventions. In some research areas, d = 0.2 is already a meaningful effect; in others, d = 0.8 is relatively small.
"Negative effect sizes mean bad results." The sign only indicates the direction. A d = -0.5 is just as large as d = +0.5, only in the opposite direction.
Further Reading
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.
- Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE.