Covariates#
A covariate is a variable that is related to the dependent variable and whose influence you want to statistically control. By accounting for covariates, you can obtain more accurate estimates of the effect under investigation.
Core Idea#
When you want to examine the effect of a treatment, there are often other variables that also influence the outcome. By including these variables as covariates in your analysis, you can remove their influence and isolate the "pure" treatment effect.
Example: Why covariates matter
A researcher compares two teaching methods on learning outcomes. However, the students also differ in their prior knowledge.
Without covariate: The difference between methods could be biased by differences in prior knowledge.
With covariate (prior knowledge): The effect of the teaching method is adjusted for the influence of prior knowledge. The result shows the "true" effect of the method.
Types of Covariates#
Confounders#
A variable that is related to both the independent and the dependent variable. If not controlled, the effect can be biased (confounded).
Classic confounder
Investigation: Relationship between ice cream consumption and drowning deaths.
- More ice cream β More drownings (positive correlation)
- Confounder: Temperature influences both. In summer, people eat more ice cream AND go swimming more often.
When you control for temperature, the apparent relationship disappears.
Control Variables#
Variables included in the analysis to reduce error variance and increase power. They do not necessarily need to be confounders.
Mediators and Moderators#
- Mediator: Explains the mechanism of the effect (IV β Mediator β DV)
- Moderator: Changes the strength of the effect (the effect depends on the moderator)
These are treated differently from classical covariates.
ANCOVA β Analysis of Covariance#
ANCOVA (Analysis of Covariance) combines ANOVA and regression. It compares group means after removing the influence of one or more covariates.
The model:
where:
- is the effect of group i
- is the regression coefficient of the covariate
- is the value of the covariate
ANCOVA Assumptions#
- All ANOVA assumptions (normality, homogeneity of variance, independence)
- Linear relationship between covariate and DV
- Homogeneity of regression slopes β The relationship between covariate and DV must be the same across all groups
- Covariate independent of IV β Group membership should not have influenced the covariate
When to Use ANCOVA#
- In experimental designs to increase power by reducing error variance
- In quasi-experimental designs to control for pre-existing differences between groups
Example: ANCOVA in practice
Comparing two therapies for depression:
- DV: Depression score after 8 weeks
- IV: Therapy type (A vs. B)
- Covariate: Depression score before therapy (baseline)
ANCOVA compares the groups after removing the influence of baseline values. This allows a fair comparison even if the groups had different starting values.
Covariates in Regression#
In multiple regression, covariates are included as additional predictors:
The coefficient represents the effect of , controlling for all other variables.
Practical Guidelines#
Which Covariates to Include?#
- Theoretically justified: The covariate should be conceptually related to the DV
- Measured before the intervention: The covariate should not have been influenced by the IV
- Reliably measured: Unreliable covariates worsen the correction
How Many Covariates?#
- Not too many: Rule of thumb in regression: at least 10β20 observations per predictor
- Each covariate consumes degrees of freedom
- Too many covariates can lead to overfitting
When Not to Use Covariates?#
- When the covariate was measured after the IV (mediator, not covariate)
- When the covariate is confounded with the IV (e.g., covariate differs systematically between groups in a quasi-experiment)
- When the assumption of homogeneity of regression slopes is violated
Common Misconceptions#
"Covariates can solve any confounding problem." No. Only measured variables can be controlled. Unmeasured confounders remain uncontrolled. This is why randomization is the gold standard.
"You should include as many covariates as possible." No. Every unnecessary covariate consumes degrees of freedom and can destabilize results. Only include theoretically justified covariates.
"ANCOVA can fully equalize baseline differences in quasi-experiments." Not completely. ANCOVA reduces bias but cannot correct for unmeasured confounders or non-linear relationships. Results should be interpreted cautiously.
"The covariate must be normally distributed." No. In ANCOVA, only the dependent variable (conditional on the covariate) must be normally distributed. The covariate itself can have any distribution.
Further Reading
- Field, A. (2018). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE.
- Cohen, J., Cohen, P., West, S. G. & Aiken, L. S. (2003). Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (3rd ed.). Lawrence Erlbaum Associates.