Change score or follow-up score? An empirical evaluation of the impact of choice of mean difference estimates

2015 Vienna

Fu R¹, Holmer H²

¹Pacific Northwest Evidence Based Practice Center, Department of Public Health and Preventive Medicine, Oregon Health, Science Univeristy, USA

²Department of Public Health and Preventive Medicine, Oregon Health, Science Univeristy, USA

Background: In randomized controlled clinical trials, continuous outcomes are typically measured at both baseline and follow-up, and mean difference could be estimated by using the change score from baseline, using the follow-up scores, or using the analysis of covariance (ANCOVA) model. When there is baseline imbalance, the ANCOVA estimate is least biased, but often not reported. The impact of using the change versus the follow-up score has not been well studied.
Objectives: Funded by the Agency for Healthcare Research and Quality, this study was to empirically assess the impact of using the change score versus the follow-up score on the conclusion of meta-analysis (MA).
Methods: We included a total of 63 MAs (156 trials) from six comparative effectiveness reviews. We evaluated differences in baseline scores on the MA level and compared combined mean differences using the change score or the follow-up score. Discrepancy in conclusion occurs when one estimate (e.g. change score) shows significant difference and the other (e.g. follow-up score) does not. We also evaluated whether the impact varied qualitatively by alternative random effect estimates.
Results: Based on the Dersimonian-Laird (DL) method, using the change score versus follow-up score led to five out of the 63 MAs (7.9%) showing discrepancy in conclusions; and based on the profile likelihood (PL) method, nine (14.3%) showed discrepancy. Using the change score was more likely to show a significant difference in effects between interventions (four out of five using the DL method, and seven out of nine using the PL method). The impact of the choice of the scores when using the restricted maximum likelihood method was similar to using the PL method. Using the Knapp-Hartung method led to most (10) MAs showing discrepancy. A significant difference in baseline scores did not necessarily lead to discrepancy in conclusion.
Conclusions: Using the change score versus the follow-up score could lead to important discrepancies in conclusions. Sensitivity analyses using both change scores and final values should be conducted to check the robustness of results to the choice of mean difference estimates.