Categorising continuous risk factors: issues and implications

2013 Québec City

Abdallah DY¹

¹University College Dublin, Ireland

Background:

Grouping continuous variables into two or more categories for the purposes of simplicity and ease of analysis and interpretation is a widely used approach in medical and epidemiologic research. Categorisation, however, has been found to lead to loss of power and efficiency, to affect internal validity, and to produce biased results.

Objectives:

To highlight the bias produced by categorising continuous variables and its impact on evidence synthesis.

Methods:

A literature review of the effects of categorisation is performed with a discussion of the alternative methods proposed to model outcome-exposure relationships. The impact of modelling body mass index (BMI) as a continuous variable versus the categorisation of BMI is compared using examples from the literature.

Results:

Categorisation of continuous variables is statistically unnecessary and can result in loss of efficiency. Dichotomisation has been found to hinder meta-analyses of observational studies and to pose several drawbacks such as loss of power, residual confounding, and assumptions of linearity. Several alternative modelling methods have been proposed, such as spline regression and smoothing techniques and generalised additive models. These methods have higher power, but require adequate sample size and sufficient data on ranges of exposure. For meta-analyses of dose-response relationships, a number of techniques have been developed for the synthesis of summary regression slopes. We review the complexities, limitations, and challenges posed by these methods. In relation to BMI, some research suggests that different modelling techniques considerably influence estimates of the relationship between BMI and mortality, with categorisations yielding biased results. Yet, other researchers have found BMI categorisation to produce good fit estimates.

Conclusions:

Grouping of continuous variables produces inconsistent and biased risk estimates. When modelling epidemiological and medical data, researchers should be aware of the trade-offs between proper statistical approaches and simple interpretation of results.