PickMyTest

Cohen's Kappa

Cohen's Kappa measures inter-rater agreement between two raters for categorical data, correcting for chance agreement.

Cohen's Kappa#

Cohen's Kappa (ΞΊ) is a statistical measure that quantifies the agreement between exactly two raters who classify the same set of subjects into categories. Unlike simple percent agreement, Kappa accounts for the proportion of agreement expected by chance alone. This makes it a far more realistic indicator of true inter-rater agreement.

When to Use#

  • You have exactly two raters who evaluate independently
  • Ratings are on a categorical scale (e.g., healthy/sick, Type A/B/C)
  • You want to know whether agreement exceeds chance level
  • Both raters evaluate the same set of subjects (patients, images, texts, etc.)
  • You need a single, easy-to-interpret measure of agreement

Assumptions#

  • Exactly 2 raters
  • Both raters evaluate the same subjects
  • Categorical scale (nominal or ordinal)
  • Independent ratings (no mutual influence)

Formula#

Cohen's Kappa is calculated from the observed agreement pop_o and the agreement expected by chance pep_e:

ΞΊ=poβˆ’pe1βˆ’pe\kappa = \frac{p_o - p_e}{1 - p_e}

Here, pop_o is the proportion of cases where both raters agree, and pep_e is the proportion of agreement expected under random assignment. pep_e is derived from the marginal distributions of the contingency table:

pe=βˆ‘kpk1β‹…pk2p_e = \sum_{k} p_{k1} \cdot p_{k2}

where pk1p_{k1} and pk2p_{k2} are the relative frequencies of category kk for Rater 1 and Rater 2, respectively.

Example#

Practical Example: Diagnosis by Two Doctors

Two doctors independently examine 100 patients and classify each as healthy or sick. The results:

Doctor 2: healthyDoctor 2: sickTotal
Doctor 1: healthy401050
Doctor 1: sick54550
Total4555100

Observed agreement: po=(40+45)/100=0.85p_o = (40 + 45) / 100 = 0.85

Expected agreement: pe=(50/100Γ—45/100)+(50/100Γ—55/100)=0.225+0.275=0.50p_e = (50/100 \times 45/100) + (50/100 \times 55/100) = 0.225 + 0.275 = 0.50

Kappa: ΞΊ=(0.85βˆ’0.50)/(1βˆ’0.50)=0.70\kappa = (0.85 - 0.50) / (1 - 0.50) = 0.70

With ΞΊ=0.70\kappa = 0.70, this indicates substantial agreement.

Effect Size#

Cohen's Kappa is itself an effect size measure. The most widely used interpretation comes from Landis and Koch (1977):

Kappa ValueInterpretation
< 0.00Poor (worse than chance)
0.00 – 0.20Slight
0.21 – 0.40Fair
0.41 – 0.60Moderate
0.61 – 0.80Substantial
0.81 – 1.00Almost perfect

A ΞΊ\kappa of 1 means perfect agreement, a ΞΊ\kappa of 0 corresponds to pure chance agreement, and negative values indicate systematic disagreement.

Further Reading

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.
  • Landis, J. R. & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.
  • McHugh, M. L. (2012). Interrater reliability: the kappa statistic. Biochemia Medica, 22(3), 276–282.