Sample Size#

Sample size (n) is one of the most important factors for the quality and informativeness of a study. It affects statistical power, the precision of estimates, and the generalizability of results.

Why Does Sample Size Matter?#

1. Statistical Power#

Larger samples have more power — a higher probability of detecting effects that truly exist.

Impact of sample size on power

t-test, expected effect d = 0.5, α = 0.05 (two-tailed):

n per group	Power
10	0.18
20	0.34
50	0.70
64	0.80
100	0.94

The recommended power of 0.80 is only reached at n = 64 per group.

2. Precision of Estimates#

The standard error of the mean decreases with increasing n:

SE = \frac{s}{\sqrt{n}}

Double the sample size → Standard error decreases by a factor of $\sqrt{2} \approx 1.41$ . Confidence intervals become narrower.

3. Robustness#

Larger samples make tests more robust against violations of the normality assumption (central limit theorem).

How to Determine the Right Sample Size#

A Priori Power Analysis#

The recommended method. Before data collection, specify:

Desired power (typically 0.80 or 0.90)
Significance level (typically α = 0.05)
Expected effect size (from literature, pilot study, or theory)
Type of test (t-test, ANOVA, correlation, etc.)

Required Sample Sizes (Reference Values)#

For an independent samples t-test (α = 0.05, two-tailed, power = 0.80):

Expected effect (d)	n per group	Total
0.2 (small)	394	788
0.5 (medium)	64	128
0.8 (large)	26	52

For a one-way ANOVA (3 groups, α = 0.05, power = 0.80):

Expected effect (f)	n per group	Total
0.10 (small)	322	966
0.25 (medium)	53	159
0.40 (large)	22	66

For a Pearson correlation (α = 0.05, two-tailed, power = 0.80):

Expected effect (r)	n
0.10 (small)	782
0.30 (medium)	85
0.50 (large)	28

Software for Power Analyses#

G*Power — Free, widely used, suitable for many tests
R — Packages like pwr, powerAnalysis
SPSS — Built-in power analysis functions
Online calculators — For simple calculations

Too Small a Sample — The Risks#

Low power: Real effects are missed (Type II error)
Overestimation of effect sizes: Significant results with small n systematically overestimate the true effect
Unstable results: Replication becomes unlikely
Non-normality issues: Parametric tests are less robust

The problem with small samples

A study with n = 10 per group finds a significant effect of d = 1.2. Sounds impressive, but:

With such small n, only a very large effect can reach significance
The true effect size is likely considerably smaller
This phenomenon is known as the "Winner's Curse" or "regression to the mean"

Too Large a Sample — Is That Possible?#

Yes, there are downsides to excessively large samples:

Trivial effects become significant: With n = 10,000, nearly any small difference becomes significant
Cost and effort: Wasted resources when power is already sufficient at a smaller sample
Ethical considerations: In clinical trials, more participants are assigned to the control group than necessary

The solution is to always report effect sizes and evaluate practical significance.

Rules of Thumb (Use with Caution)#

Analysis	Minimum (rough guideline)
t-test	n ≥ 20 per group
ANOVA	n ≥ 15 per group
Correlation	n ≥ 30
Regression	n ≥ 10–20 per predictor
Chi-square	Expected frequencies ≥ 5 per cell

Important: These rules of thumb are not a substitute for a formal power analysis. They can be misleading when the expected effect size is small.

Unequal Group Sizes#

Equal group sizes are ideal but not always possible. The implications:

Power: Maximum power with equal group sizes
Robustness: Unequal groups + variance heterogeneity = problematic
Rule of thumb: Ratios up to 1:1.5 are usually unproblematic

Common Misconceptions#

"n = 30 is always enough." This rule of thumb refers to the central limit theorem (normality of means). Whether n = 30 provides enough power depends on the effect size. For small effects, n = 30 is far from sufficient.

"More participants are always better." Not necessarily. Beyond a certain n, the gain in power is minimal. Resources may be better spent on improved study design.

"Sample size can be determined after the study." Sample size should be planned a priori. Adding data until significance is reached is methodologically questionable (optional stopping / p-hacking).

Sample Size