PickMyTest

Normal Distribution

The Gaussian distribution and its central role in statistics

Normal Distribution#

The normal distribution (also called Gaussian distribution or bell curve) is the most important distribution in statistics. Many statistical tests assume that data are normally distributed, and numerous natural phenomena approximately follow this distribution.

What Is the Normal Distribution?#

A normally distributed variable is symmetric around its mean. The distribution is completely described by two parameters:

XN(μ,σ2)X \sim N(\mu, \sigma^2)

where:

  • μ\mu is the mean (expected value)
  • σ2\sigma^2 is the variance
  • σ\sigma is the standard deviation

The probability density function is:

f(x)=1σ2πe(xμ)22σ2f(x) = \frac{1}{\sigma\sqrt{2\pi}} \, e^{-\frac{(x - \mu)^2}{2\sigma^2}}

The 68-95-99.7 Rule#

In a normal distribution:

  • 68.3% of values fall within μ±1σ\mu \pm 1\sigma
  • 95.4% of values fall within μ±2σ\mu \pm 2\sigma
  • 99.7% of values fall within μ±3σ\mu \pm 3\sigma

Example: IQ Distribution

IQ is normally distributed with μ = 100 and σ = 15.

  • 68% of people have an IQ between 85 and 115
  • 95% have an IQ between 70 and 130
  • 99.7% have an IQ between 55 and 145

The Standard Normal Distribution#

Any normal distribution can be converted to the standard normal distribution via z-transformation:

z=xμσz = \frac{x - \mu}{\sigma}

The standard normal distribution has μ=0\mu = 0 and σ=1\sigma = 1.

Why Is It So Important?#

Central Limit Theorem#

The central limit theorem states: Regardless of the distribution of the population, the distribution of the sample mean is approximately normal for a sufficiently large sample size.

XˉN(μ,σ2n)\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)

This means: Even if individual measurements are not normally distributed, the mean of many such measurements can be normally distributed. A common rule of thumb is n ≥ 30.

Assumption of Many Tests#

Parametric tests such as t-tests, ANOVA, and Pearson correlation assume normality. When this assumption is violated, non-parametric alternatives are available.

Testing for Normality#

Graphical Methods#

  • Histogram — Shows the shape of the distribution
  • Q-Q plot (Quantile-Quantile plot) — Points should fall along a straight line
  • Box plot — Identifies asymmetries and outliers

Statistical Tests#

TestSuitable forNotes
Shapiro-Wilkn < 50Most recommended for small samples
Kolmogorov-Smirnovn ≥ 50Less sensitive than Shapiro-Wilk
Anderson-DarlingAll nEmphasizes tails of the distribution

Descriptive Measures#

  • Skewness: Should be close to 0. Values between -1 and +1 are considered acceptable.
  • Kurtosis: Should be close to 3 (or excess kurtosis close to 0). Values between -2 and +2 are considered acceptable.

What to Do When Normality Is Violated#

  1. Use non-parametric tests — e.g., Mann-Whitney instead of t-test
  2. Transform the data — Log transformation, square root transformation, or Box-Cox transformation
  3. Increase sample size — The central limit theorem provides robustness with large samples
  4. Bootstrapping — Distribution-free method for estimating confidence intervals

Common Misconceptions#

"The raw data themselves must be normally distributed." For many tests, it is sufficient if the residuals are normally distributed (regression) or the differences (paired t-test). It is not always about the raw data.

"The Shapiro-Wilk test is not significant, so the data are normally distributed." A non-significant result only means that we cannot reject the null hypothesis (normality). With small samples, the test has low power and can easily miss deviations.

"Normality must be perfect." Parametric tests are often robust to mild violations of the normality assumption, especially with larger samples (n > 30).

Further Reading

  • Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE.
  • Bortz, J. & Schuster, C. (2010). Statistik für Human- und Sozialwissenschaftler (7th ed.). Springer.