Article type
Year
Abstract
Background: Non-randomised evaluations of healthcare interventions are considered less reliable than randomised comparisons, but the degree to which they may be biased is unknown. This hinders the interpretation of systematic reviews of non-randomised studies. We have empirically estimated distributions of bias associated with particular designs by constructing randomised and non-randomised comparisons using data from a large multi-centre trial
Methods: Study designs were created from the International Stroke Trial. 100 participants randomly allocated to aspirin and 100 participants allocated to no treatment were selected from each of 14 regions where the trial recruited. To emulate historically and concurrently controlled studies, treated participants were additionally compared with (a) 100 controls from the same region recruited into the trial earlier, and (b) 100 concurrent controls from a different region. Each design was created for each region, and the process repeated 1000 times. The distribution of the results for each non-randomised design was compared to the distribution of randomised comparisons.
Results: Aspirin has a small beneficial effect on death or dependency at six months (OR=0.91), only reliably detectable in trials of 20,000 participants. Only 7% of randomised comparisons of sample size 200 showed statistically significant benefit, 2% showing significant harm. In contrast, 16% and 29% of concurrent and historical comparisons showed statistically significant benefits, but 4% and 22% inappropriately indicated significant harm. Averaged across 14,000 studies concurrent comparisons were unbiased, but historical comparisons inflated the treatment effect by 33%.
Conclusions: Even among participants recruited according to the same protocol, non-randomised comparisons can be seriously biased. Whilst some designs introduced systematic bias, all introduced unpredictable biases which lead to both over- and under-estimates of treatment efficacy in individual studies. We would predict that studies included in systematic reviews of non-randomised evaluations of healthcare interventions would often be conflicting, and that in many situations the overall results of these reviews may be misleading.
Methods: Study designs were created from the International Stroke Trial. 100 participants randomly allocated to aspirin and 100 participants allocated to no treatment were selected from each of 14 regions where the trial recruited. To emulate historically and concurrently controlled studies, treated participants were additionally compared with (a) 100 controls from the same region recruited into the trial earlier, and (b) 100 concurrent controls from a different region. Each design was created for each region, and the process repeated 1000 times. The distribution of the results for each non-randomised design was compared to the distribution of randomised comparisons.
Results: Aspirin has a small beneficial effect on death or dependency at six months (OR=0.91), only reliably detectable in trials of 20,000 participants. Only 7% of randomised comparisons of sample size 200 showed statistically significant benefit, 2% showing significant harm. In contrast, 16% and 29% of concurrent and historical comparisons showed statistically significant benefits, but 4% and 22% inappropriately indicated significant harm. Averaged across 14,000 studies concurrent comparisons were unbiased, but historical comparisons inflated the treatment effect by 33%.
Conclusions: Even among participants recruited according to the same protocol, non-randomised comparisons can be seriously biased. Whilst some designs introduced systematic bias, all introduced unpredictable biases which lead to both over- and under-estimates of treatment efficacy in individual studies. We would predict that studies included in systematic reviews of non-randomised evaluations of healthcare interventions would often be conflicting, and that in many situations the overall results of these reviews may be misleading.