PickMyTest

P-Values

What p-values really mean and how to interpret them correctly

P-Values#

The p-value is one of the most commonly used — and most commonly misunderstood — concepts in statistics. A correct understanding is essential for interpreting any statistical test.

Definition#

The p-value is the probability of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is true.

Formally:

p=P(Dataobserved resultH0 is true)p = P(\text{Data} \geq \text{observed result} \mid H_0 \text{ is true})

A small p-value means: The observed result would be unlikely under the null hypothesis. This provides evidence against the null hypothesis.

The Significance Level α#

The significance level (alpha, α) is a threshold set in advance. In most disciplines:

α=0.05\alpha = 0.05

The decision rule is:

  • p < α → Result is statistically significant → Null hypothesis is rejected
  • p ≥ α → Result is not significant → Null hypothesis cannot be rejected

Example: t-test with p = 0.03

A t-test comparing two groups yields p = 0.03.

Correct interpretation: Assuming there is no difference between the groups (H₀), we would obtain a result this extreme or more extreme in only 3% of cases. Since 0.03 < 0.05, the result is considered statistically significant.

Incorrect interpretation: "There is a 97% probability that the effect is real." — This is not correct!

Different Alpha Levels#

LevelLabelUsage
α = 0.10Marginally significantExploratory studies
α = 0.05SignificantStandard in most fields
α = 0.01Highly significantStricter criteria
α = 0.001Very highly significantVery conservative tests

One-Tailed vs. Two-Tailed Tests#

  • Two-tailed test: Tests whether a difference exists in either direction. Recommended by default.
  • One-tailed test: Tests only one direction (e.g., "Group A is better than Group B"). The p-value is half as large.
pone-tailed=ptwo-tailed2p_{\text{one-tailed}} = \frac{p_{\text{two-tailed}}}{2}

Important: One-tailed tests should only be used when the direction of the effect was specified before data collection.

Multiple Testing#

When multiple tests are performed simultaneously, the probability of at least one false positive increases:

P(at least one error)=1(1α)mP(\text{at least one error}) = 1 - (1 - \alpha)^m

With 20 tests at α = 0.05, the probability of at least one error is already 64%.

Corrections:

  • Bonferroni: αadjusted=αm\alpha_{\text{adjusted}} = \frac{\alpha}{m} — Simple but conservative
  • Holm-Bonferroni: Stepwise correction, less conservative
  • Benjamini-Hochberg: Controls the False Discovery Rate (FDR)

P-Value and Effect Size#

A significant p-value says nothing about the practical significance of an effect.

Example: Large sample, small effect

With n = 10,000 per group, a t-test finds a significant difference (p < 0.001) of 0.5 points on a 100-point scale. Statistically significant — but practically irrelevant.

This is why effect size should always be reported in addition to the p-value.

Common Misconceptions#

"The p-value is the probability that the null hypothesis is true." Wrong. The p-value says nothing about the probability of the hypothesis. It gives the probability of the data under the assumption of the null hypothesis.

"p = 0.05 means there is a 95% chance the effect is real." Wrong. The p-value is not a probability for the hypothesis, but for the data.

"A non-significant result proves that no effect exists." Wrong. A p > 0.05 only means that the evidence is not sufficient to reject the null hypothesis. The effect might still exist (lack of power).

"p = 0.049 and p = 0.051 are fundamentally different." Wrong. The difference is minimal. The threshold at 0.05 is a convention, not a law of nature. Interpretation should not hinge on a single cutoff.

"The smaller the p-value, the larger the effect." Wrong. The p-value depends on the effect size and the sample size. A tiny effect can be highly significant with a huge sample.

Further Reading

  • Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver and Boyd.
  • Wasserstein, R. L. & Lazar, N. A. (2016). The ASA statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133.
  • Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE.