Sample Size#
Sample size (n) is one of the most important factors for the quality and informativeness of a study. It affects statistical power, the precision of estimates, and the generalizability of results.
Why Does Sample Size Matter?#
1. Statistical Power#
Larger samples have more power β a higher probability of detecting effects that truly exist.
Impact of sample size on power
t-test, expected effect d = 0.5, Ξ± = 0.05 (two-tailed):
| n per group | Power |
|---|---|
| 10 | 0.18 |
| 20 | 0.34 |
| 50 | 0.70 |
| 64 | 0.80 |
| 100 | 0.94 |
The recommended power of 0.80 is only reached at n = 64 per group.
2. Precision of Estimates#
The standard error of the mean decreases with increasing n:
Double the sample size β Standard error decreases by a factor of . Confidence intervals become narrower.
3. Robustness#
Larger samples make tests more robust against violations of the normality assumption (central limit theorem).
How to Determine the Right Sample Size#
A Priori Power Analysis#
The recommended method. Before data collection, specify:
- Desired power (typically 0.80 or 0.90)
- Significance level (typically Ξ± = 0.05)
- Expected effect size (from literature, pilot study, or theory)
- Type of test (t-test, ANOVA, correlation, etc.)
Required Sample Sizes (Reference Values)#
For an independent samples t-test (Ξ± = 0.05, two-tailed, power = 0.80):
| Expected effect (d) | n per group | Total |
|---|---|---|
| 0.2 (small) | 394 | 788 |
| 0.5 (medium) | 64 | 128 |
| 0.8 (large) | 26 | 52 |
For a one-way ANOVA (3 groups, Ξ± = 0.05, power = 0.80):
| Expected effect (f) | n per group | Total |
|---|---|---|
| 0.10 (small) | 322 | 966 |
| 0.25 (medium) | 53 | 159 |
| 0.40 (large) | 22 | 66 |
For a Pearson correlation (Ξ± = 0.05, two-tailed, power = 0.80):
| Expected effect (r) | n |
|---|---|
| 0.10 (small) | 782 |
| 0.30 (medium) | 85 |
| 0.50 (large) | 28 |
Software for Power Analyses#
- G*Power β Free, widely used, suitable for many tests
- R β Packages like
pwr,powerAnalysis - SPSS β Built-in power analysis functions
- Online calculators β For simple calculations
Too Small a Sample β The Risks#
- Low power: Real effects are missed (Type II error)
- Overestimation of effect sizes: Significant results with small n systematically overestimate the true effect
- Unstable results: Replication becomes unlikely
- Non-normality issues: Parametric tests are less robust
The problem with small samples
A study with n = 10 per group finds a significant effect of d = 1.2. Sounds impressive, but:
- With such small n, only a very large effect can reach significance
- The true effect size is likely considerably smaller
- This phenomenon is known as the "Winner's Curse" or "regression to the mean"
Too Large a Sample β Is That Possible?#
Yes, there are downsides to excessively large samples:
- Trivial effects become significant: With n = 10,000, nearly any small difference becomes significant
- Cost and effort: Wasted resources when power is already sufficient at a smaller sample
- Ethical considerations: In clinical trials, more participants are assigned to the control group than necessary
The solution is to always report effect sizes and evaluate practical significance.
Rules of Thumb (Use with Caution)#
| Analysis | Minimum (rough guideline) |
|---|---|
| t-test | n β₯ 20 per group |
| ANOVA | n β₯ 15 per group |
| Correlation | n β₯ 30 |
| Regression | n β₯ 10β20 per predictor |
| Chi-square | Expected frequencies β₯ 5 per cell |
Important: These rules of thumb are not a substitute for a formal power analysis. They can be misleading when the expected effect size is small.
Unequal Group Sizes#
Equal group sizes are ideal but not always possible. The implications:
- Power: Maximum power with equal group sizes
- Robustness: Unequal groups + variance heterogeneity = problematic
- Rule of thumb: Ratios up to 1:1.5 are usually unproblematic
Common Misconceptions#
"n = 30 is always enough." This rule of thumb refers to the central limit theorem (normality of means). Whether n = 30 provides enough power depends on the effect size. For small effects, n = 30 is far from sufficient.
"More participants are always better." Not necessarily. Beyond a certain n, the gain in power is minimal. Resources may be better spent on improved study design.
"Sample size can be determined after the study." Sample size should be planned a priori. Adding data until significance is reached is methodologically questionable (optional stopping / p-hacking).
Further Reading
- Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.
- Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE.