PickMyTest

Sample Size

How sample size affects statistical results and how to plan it

Sample Size#

Sample size (n) is one of the most important factors for the quality and informativeness of a study. It affects statistical power, the precision of estimates, and the generalizability of results.

Why Does Sample Size Matter?#

1. Statistical Power#

Larger samples have more power β€” a higher probability of detecting effects that truly exist.

Impact of sample size on power

t-test, expected effect d = 0.5, Ξ± = 0.05 (two-tailed):

n per groupPower
100.18
200.34
500.70
640.80
1000.94

The recommended power of 0.80 is only reached at n = 64 per group.

2. Precision of Estimates#

The standard error of the mean decreases with increasing n:

SE=snSE = \frac{s}{\sqrt{n}}

Double the sample size β†’ Standard error decreases by a factor of 2β‰ˆ1.41\sqrt{2} \approx 1.41. Confidence intervals become narrower.

3. Robustness#

Larger samples make tests more robust against violations of the normality assumption (central limit theorem).

How to Determine the Right Sample Size#

A Priori Power Analysis#

The recommended method. Before data collection, specify:

  1. Desired power (typically 0.80 or 0.90)
  2. Significance level (typically Ξ± = 0.05)
  3. Expected effect size (from literature, pilot study, or theory)
  4. Type of test (t-test, ANOVA, correlation, etc.)

Required Sample Sizes (Reference Values)#

For an independent samples t-test (Ξ± = 0.05, two-tailed, power = 0.80):

Expected effect (d)n per groupTotal
0.2 (small)394788
0.5 (medium)64128
0.8 (large)2652

For a one-way ANOVA (3 groups, Ξ± = 0.05, power = 0.80):

Expected effect (f)n per groupTotal
0.10 (small)322966
0.25 (medium)53159
0.40 (large)2266

For a Pearson correlation (Ξ± = 0.05, two-tailed, power = 0.80):

Expected effect (r)n
0.10 (small)782
0.30 (medium)85
0.50 (large)28

Software for Power Analyses#

  • G*Power β€” Free, widely used, suitable for many tests
  • R β€” Packages like pwr, powerAnalysis
  • SPSS β€” Built-in power analysis functions
  • Online calculators β€” For simple calculations

Too Small a Sample β€” The Risks#

  1. Low power: Real effects are missed (Type II error)
  2. Overestimation of effect sizes: Significant results with small n systematically overestimate the true effect
  3. Unstable results: Replication becomes unlikely
  4. Non-normality issues: Parametric tests are less robust

The problem with small samples

A study with n = 10 per group finds a significant effect of d = 1.2. Sounds impressive, but:

  • With such small n, only a very large effect can reach significance
  • The true effect size is likely considerably smaller
  • This phenomenon is known as the "Winner's Curse" or "regression to the mean"

Too Large a Sample β€” Is That Possible?#

Yes, there are downsides to excessively large samples:

  1. Trivial effects become significant: With n = 10,000, nearly any small difference becomes significant
  2. Cost and effort: Wasted resources when power is already sufficient at a smaller sample
  3. Ethical considerations: In clinical trials, more participants are assigned to the control group than necessary

The solution is to always report effect sizes and evaluate practical significance.

Rules of Thumb (Use with Caution)#

AnalysisMinimum (rough guideline)
t-testn β‰₯ 20 per group
ANOVAn β‰₯ 15 per group
Correlationn β‰₯ 30
Regressionn β‰₯ 10–20 per predictor
Chi-squareExpected frequencies β‰₯ 5 per cell

Important: These rules of thumb are not a substitute for a formal power analysis. They can be misleading when the expected effect size is small.

Unequal Group Sizes#

Equal group sizes are ideal but not always possible. The implications:

  • Power: Maximum power with equal group sizes
  • Robustness: Unequal groups + variance heterogeneity = problematic
  • Rule of thumb: Ratios up to 1:1.5 are usually unproblematic

Common Misconceptions#

"n = 30 is always enough." This rule of thumb refers to the central limit theorem (normality of means). Whether n = 30 provides enough power depends on the effect size. For small effects, n = 30 is far from sufficient.

"More participants are always better." Not necessarily. Beyond a certain n, the gain in power is minimal. Resources may be better spent on improved study design.

"Sample size can be determined after the study." Sample size should be planned a priori. Adding data until significance is reached is methodologically questionable (optional stopping / p-hacking).

Further Reading

  • Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.
  • Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE.