Mann-Whitney U Test#

The Mann-Whitney U test (also: Wilcoxon rank-sum test) is a nonparametric test for comparing two independent groups. It tests whether the distributions of both groups differ systematically and is the nonparametric alternative to the independent samples t-test.

When to Use#

Use the Mann-Whitney U test when you want to:

Compare two independent groups
The dependent variable is at least ordinally scaled
The normality assumption of the t-test is violated
The sample is small and the distribution shape is unclear

The test is based on ranks rather than raw values and therefore does not require any specific distributional form.

Assumptions#

Independence of observations (both between and within groups)
At least ordinal scale of the dependent variable
Similar distribution shape in both groups (for interpretation as a median comparison)
The variable is continuous enough that ties are rare

Note: Strictly speaking, the Mann-Whitney U test evaluates whether a randomly chosen value from Group 1 is equally likely to be greater or smaller than a randomly chosen value from Group 2. It can only be interpreted as a median comparison when the distribution shapes are similar.

Formula#

The U statistic is calculated for both groups. First, all values are combined and ranked. Then:

U_1 = n_1 n_2 + \frac{n_1(n_1 + 1)}{2} - R_1

U_2 = n_1 n_2 + \frac{n_2(n_2 + 1)}{2} - R_2

where:

$n_1$ and $n_2$ are the sample sizes of the two groups
$R_1$ and $R_2$ are the rank sums of the respective groups

It always holds that: $U_1 + U_2 = n_1 \cdot n_2$

The test statistic is $U = \min(U_1, U_2)$ .

For large samples ( $n_1, n_2 > 20$ ), a z-approximation can be used:

z = \frac{U - \frac{n_1 n_2}{2}}{\sqrt{\frac{n_1 n_2 (n_1 + n_2 + 1)}{12}}}

Example#

Practical Example: Patient Satisfaction

A hospital wants to compare patient satisfaction between two wards. Satisfaction is measured on a Likert scale (1--5) -- the data are therefore ordinal and not normally distributed.

Ward A (n=25): Patient satisfaction scores
Ward B (n=30): Patient satisfaction scores

Since the data are ordinally scaled and the normality assumption is not met, the Mann-Whitney U test is used instead of the t-test. All 55 values are combined into a joint ranking, and the rank sums of the two groups are compared.

Effect Size#

The rank-biserial correlation coefficient $r_{rb}$ as a measure of effect size:

r_{rb} = 1 - \frac{2U}{n_1 n_2}

Alternatively, $r$ can be calculated from the z-statistic:

r = \frac{z}{\sqrt{N}}

Effect Size	r
Small	0.1
Medium	0.3
Large	0.5

Tip: When results are significant, the effect size should always be reported alongside the p-value. The Mann-Whitney U test often has higher statistical power than the t-test when the normality assumption is violated.