PickMyTest

Chi-Square Test of Independence

Tests the independence of two categorical variables

Chi-Square Test of Independence#

The chi-square test of independence (Ο‡2\chi^2 test) tests whether two categorical variables are statistically independent of each other. It is based on comparing observed and expected frequencies in a contingency table.

When to Use#

Use the chi-square test when you want to:

  • Examine the association between two categorical variables
  • The data are presented as frequencies in a contingency table
  • The expected frequencies in all cells are at least 5
  • The sample size is sufficiently large

Assumptions#

  • Independence of observations
  • Categorical (nominal or ordinal) variables
  • Expected frequencies β‰₯ 5 in all cells of the contingency table
  • Random sampling

Formula#

The test statistic is calculated as:

Ο‡2=βˆ‘i=1rβˆ‘j=1c(Oijβˆ’Eij)2Eij\chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

where OijO_{ij} is the observed frequency and EijE_{ij} is the expected frequency in cell (i,j)(i, j). The expected frequencies are calculated as:

Eij=niβ‹…β‹…nβ‹…jNE_{ij} = \frac{n_{i \cdot} \cdot n_{\cdot j}}{N}

where niβ‹…n_{i \cdot} is the row total, nβ‹…jn_{\cdot j} is the column total, and NN is the grand total.

Example#

Practical Example: Smoking and Gender

A researcher examines whether there is an association between gender and smoking behavior. 200 people are surveyed:

SmokerNon-smokerTotal
Male4555100
Female3070100
Total75125200

The chi-square test examines whether the distribution of smoking behavior is independent of gender. The expected frequency for "Male/Smoker" would be 100β‹…75200=37.5\frac{100 \cdot 75}{200} = 37.5.

Effect Size#

Cramer's V as a measure of effect size:

V=Ο‡2Nβ‹…(min⁑(r,c)βˆ’1)V = \sqrt{\frac{\chi^2}{N \cdot (\min(r, c) - 1)}}

where rr is the number of rows and cc is the number of columns.

Effect SizeCramer's V (df*=1)Cramer's V (df*=2)
Small0.100.07
Medium0.300.21
Large0.500.35

df = min(r, c) - 1

Further Reading

  • Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50(302), 157–175.
  • Agresti, A. (2007). An Introduction to Categorical Data Analysis (2nd ed.). Wiley.
  • Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE.