Chi-Square Goodness of Fit Test#

The Chi-Square Goodness of Fit test tests whether the observed frequency distribution of a categorical variable matches a theoretically expected distribution. It answers the question: Does the observed distribution deviate significantly from the expected one?

When to Use#

Use the Chi-Square Goodness of Fit test when you want to:

Test whether a frequency distribution matches an expected distribution
Examine a single categorical variable with multiple categories
Test whether categories are uniformly distributed or follow a specific distribution
The sample size is sufficiently large (expected frequencies ≥ 5)

Assumptions#

Observations are independent of one another
The variable is categorical (nominal or ordinal)
All expected frequencies are ≥ 5 (if violated: merge categories or use an exact test)
Each observation belongs to exactly one category

Formula#

The test statistic is calculated as:

\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}

where:

$O_i$ is the observed frequency of category $i$
$E_i$ is the expected frequency of category $i$
$k$ is the number of categories

The degrees of freedom are:

df = k - 1

Under the assumption of a uniform distribution:

E_i = \frac{n}{k}

Example#

Practical Example: Fairness of a Die

A player suspects that a die is not fair. They roll the die 120 times and record the results:

Face	1	2	3	4	5	6
Observed (O)	25	17	15	23	24	16
Expected (E)	20	20	20	20	20	20

For a fair die, each face is expected equally often: $E_i = 120/6 = 20$ .

$\chi^2 = \frac{(25-20)^2}{20} + \frac{(17-20)^2}{20} + \frac{(15-20)^2}{20} + \frac{(23-20)^2}{20} + \frac{(24-20)^2}{20} + \frac{(16-20)^2}{20} = 5.0$

With df = 5, p = 0.416 (not significant). There is no evidence that the die is unfair.

Effect Size#

Cohen's w as a measure of effect size:

w = \sqrt{\frac{\chi^2}{n}}

Alternatively:

w = \sqrt{\sum_{i=1}^{k} \frac{(p_{\text{observed},i} - p_{\text{expected},i})^2}{p_{\text{expected},i}}}

Effect Size	Cohen's w
Small	0.10
Medium	0.30
Large	0.50

Note: The Chi-Square Goodness of Fit test is sensitive to sample size. With very large samples, even small, practically insignificant deviations can become statistically significant. Therefore, the effect size should always be reported.