Chi-Square Test of Independence#

The chi-square test of independence ( $\chi^2$ test) tests whether two categorical variables are statistically independent of each other. It is based on comparing observed and expected frequencies in a contingency table.

When to Use#

Use the chi-square test when you want to:

Examine the association between two categorical variables
The data are presented as frequencies in a contingency table
The expected frequencies in all cells are at least 5
The sample size is sufficiently large

Assumptions#

Independence of observations
Categorical (nominal or ordinal) variables
Expected frequencies ≥ 5 in all cells of the contingency table
Random sampling

Formula#

The test statistic is calculated as:

\chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

where $O_{ij}$ is the observed frequency and $E_{ij}$ is the expected frequency in cell $(i, j)$ . The expected frequencies are calculated as:

E_{ij} = \frac{n_{i \cdot} \cdot n_{\cdot j}}{N}

where $n_{i \cdot}$ is the row total, $n_{\cdot j}$ is the column total, and $N$ is the grand total.

Example#

Practical Example: Smoking and Gender

A researcher examines whether there is an association between gender and smoking behavior. 200 people are surveyed:

	Smoker	Non-smoker	Total
Male	45	55	100
Female	30	70	100
Total	75	125	200

The chi-square test examines whether the distribution of smoking behavior is independent of gender. The expected frequency for "Male/Smoker" would be $\frac{100 \cdot 75}{200} = 37.5$ .

Effect Size#

Cramer's V as a measure of effect size:

V = \sqrt{\frac{\chi^2}{N \cdot (\min(r, c) - 1)}}

where $r$ is the number of rows and $c$ is the number of columns.

Effect Size	Cramer's V (df*=1)	Cramer's V (df*=2)
Small	0.10	0.07
Medium	0.30	0.21
Large	0.50	0.35

df = min(r, c) - 1