Logistic Regression#
Logistic regression is a method for modeling the probability of a binary outcome (e.g., yes/no, sick/healthy) as a function of one or more independent variables. Unlike linear regression, the dependent variable is categorical (dichotomous).
When to Use#
Use logistic regression when you want to:
- Predict a binary dependent variable (e.g., 0/1, yes/no)
- Identify the influencing factors on an event
- Estimate the probability of an event occurring
- The predictors can be metric, ordinal, or categorical (mixed data types)
Assumptions#
- Dependent variable is binary coded (0/1)
- Independence of observations
- No multicollinearity among predictors (VIF < 10)
- Linear relationship between predictors and the logit of the dependent variable
- Sufficiently large sample size (rule of thumb: at least 10 events per predictor)
- No influential outliers
Formula#
The logistic regression model uses the logit function:
where is the probability of the event. Solved for :
The Odds Ratio for a predictor:
Example#
Practical Example: Customer Churn
A telecommunications company wants to predict whether a customer will cancel their contract (1) or stay (0). The predictors used are:
- Xβ: Contract duration (in months)
- Xβ: Monthly costs (in EUR)
- Xβ: Number of complaints
Result: The odds ratio for complaints is OR = 1.85. This means: With each additional complaint, the odds of cancellation increase by a factor of 1.85 (i.e., by 85%), holding all other variables constant.
Effect Size#
Various pseudo-RΒ² measures are used for logistic regression:
Nagelkerke's RΒ²:
where is the likelihood of the null model and is the likelihood of the full model.
| Effect Size | Nagelkerke's RΒ² |
|---|---|
| Small | 0.02 |
| Medium | 0.13 |
| Large | 0.26 |
Additionally, odds ratios () are an important measure of the practical significance of individual predictors. Classification accuracy and the ROC curve (AUC) evaluate the overall model fit.
Further Reading
- Hosmer, D. W., Lemeshow, S. & Sturdivant, R. X. (2013). Applied Logistic Regression (3rd ed.). Wiley.
- Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE.