Chi-Square Test of Independence#
The chi-square test of independence ( test) tests whether two categorical variables are statistically independent of each other. It is based on comparing observed and expected frequencies in a contingency table.
When to Use#
Use the chi-square test when you want to:
- Examine the association between two categorical variables
- The data are presented as frequencies in a contingency table
- The expected frequencies in all cells are at least 5
- The sample size is sufficiently large
Assumptions#
- Independence of observations
- Categorical (nominal or ordinal) variables
- Expected frequencies β₯ 5 in all cells of the contingency table
- Random sampling
Formula#
The test statistic is calculated as:
where is the observed frequency and is the expected frequency in cell . The expected frequencies are calculated as:
where is the row total, is the column total, and is the grand total.
Example#
Practical Example: Smoking and Gender
A researcher examines whether there is an association between gender and smoking behavior. 200 people are surveyed:
| Smoker | Non-smoker | Total | |
|---|---|---|---|
| Male | 45 | 55 | 100 |
| Female | 30 | 70 | 100 |
| Total | 75 | 125 | 200 |
The chi-square test examines whether the distribution of smoking behavior is independent of gender. The expected frequency for "Male/Smoker" would be .
Effect Size#
Cramer's V as a measure of effect size:
where is the number of rows and is the number of columns.
| Effect Size | Cramer's V (df*=1) | Cramer's V (df*=2) |
|---|---|---|
| Small | 0.10 | 0.07 |
| Medium | 0.30 | 0.21 |
| Large | 0.50 | 0.35 |
df = min(r, c) - 1
Further Reading
- Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine, 50(302), 157β175.
- Agresti, A. (2007). An Introduction to Categorical Data Analysis (2nd ed.). Wiley.
- Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE.