Cohen's Kappa#

Cohen's Kappa (κ) is a statistical measure that quantifies the agreement between exactly two raters who classify the same set of subjects into categories. Unlike simple percent agreement, Kappa accounts for the proportion of agreement expected by chance alone. This makes it a far more realistic indicator of true inter-rater agreement.

When to Use#

You have exactly two raters who evaluate independently
Ratings are on a categorical scale (e.g., healthy/sick, Type A/B/C)
You want to know whether agreement exceeds chance level
Both raters evaluate the same set of subjects (patients, images, texts, etc.)
You need a single, easy-to-interpret measure of agreement

Assumptions#

Exactly 2 raters
Both raters evaluate the same subjects
Categorical scale (nominal or ordinal)
Independent ratings (no mutual influence)

Formula#

Cohen's Kappa is calculated from the observed agreement $p_o$ and the agreement expected by chance $p_e$ :

\kappa = \frac{p_o - p_e}{1 - p_e}

Here, $p_o$ is the proportion of cases where both raters agree, and $p_e$ is the proportion of agreement expected under random assignment. $p_e$ is derived from the marginal distributions of the contingency table:

p_e = \sum_{k} p_{k1} \cdot p_{k2}

where $p_{k1}$ and $p_{k2}$ are the relative frequencies of category $k$ for Rater 1 and Rater 2, respectively.

Example#

Practical Example: Diagnosis by Two Doctors

Two doctors independently examine 100 patients and classify each as healthy or sick. The results:

	Doctor 2: healthy	Doctor 2: sick	Total
Doctor 1: healthy	40	10	50
Doctor 1: sick	5	45	50
Total	45	55	100

Observed agreement: $p_o = (40 + 45) / 100 = 0.85$

Expected agreement: $p_e = (50/100 \times 45/100) + (50/100 \times 55/100) = 0.225 + 0.275 = 0.50$

Kappa: $\kappa = (0.85 - 0.50) / (1 - 0.50) = 0.70$

With $\kappa = 0.70$ , this indicates substantial agreement.

Effect Size#

Cohen's Kappa is itself an effect size measure. The most widely used interpretation comes from Landis and Koch (1977):

Kappa Value	Interpretation
< 0.00	Poor (worse than chance)
0.00 – 0.20	Slight
0.21 – 0.40	Fair
0.41 – 0.60	Moderate
0.61 – 0.80	Substantial
0.81 – 1.00	Almost perfect

A $\kappa$ of 1 means perfect agreement, a $\kappa$ of 0 corresponds to pure chance agreement, and negative values indicate systematic disagreement.