Multiple Linear Regression#

Multiple linear regression models the relationship between a continuous dependent variable (criterion) and multiple independent variables (predictors). It allows examining the influence of several predictors simultaneously and making predictions.

When to Use#

Use multiple regression when you want to:

Examine the influence of multiple predictors on a dependent variable
The dependent variable is metric (continuous)
Make predictions based on multiple independent variables
Determine the relative contribution of individual predictors

Assumptions#

Linearity: Linear relationship between predictors and criterion
Normal distribution of residuals (Q-Q plot, Shapiro-Wilk test)
Homoscedasticity: Constant variance of residuals (Breusch-Pagan test)
No multicollinearity: Predictors are not too highly correlated (VIF < 10)
Independence of residuals (Durbin-Watson test)
No influential outliers (Cook's Distance)

Formula#

The regression model is:

Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_k X_k + \varepsilon

where:

$Y$ is the dependent variable
$\beta_0$ is the intercept
$\beta_1, \beta_2, \dots, \beta_k$ are the regression coefficients of the predictors
$\varepsilon$ is the error term (normally distributed with mean 0)

The coefficients are estimated using Ordinary Least Squares (OLS):

\hat{\boldsymbol{\beta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{Y}

Example#

Practical Example: Salary Prediction

A personnel consultant wants to predict the salary of employees. The predictors used are:

X₁: Work experience (in years)
X₂: Education level (years of education)
X₃: Weekly working hours

The model yields: Salary = 15,000 + 2,500 * Experience + 1,800 * Education + 300 * Hours

Interpretation: For each additional year of work experience, salary increases by an average of 2,500 EUR, holding all other variables constant (ceteris paribus).

Effect Size#

The coefficient of determination R² and the adjusted R² as measures of effect size:

R^2 = 1 - \frac{SS_{\text{Residuals}}}{SS_{\text{Total}}}

R^2_{\text{adj}} = 1 - \frac{(1 - R^2)(n - 1)}{n - k - 1}

Effect Size	R² (Cohen)
Small	0.02
Medium	0.13
Large	0.26

Additionally, Cohen's f² provides the effect size:

f^2 = \frac{R^2}{1 - R^2}

Important: A high R² does not automatically imply a causal model. Standardized coefficients (beta weights) allow comparison of the relative importance of predictors.