Article type
Year
Abstract
Background: Results from apparently conclusive meta-analyses may be false. A limited number of events from a few small trials and the presence of random error may be under-recognised sources of spurious findings. Sample size requirement for a reliable and conclusive meta-analysis should be no less rigorous than those of a single optimally-powered randomised clinical trial. If a meta-analysis is conducted before reaching a sufficient sample size it should be evaluated in a manner that accounts for the increased risk that the result may represent a chance finding.
Methods: We analysed 33 meta-analyses that had a sufficient sample size to detect a realistic treatment effect. We successively monitored the results of the meta-analyses by generating interim cumulative meta-analyses after each included trial and evaluated statistical significance after each using both two-sided O'Brien-Fleming monitoring boundaries and the conventional criterion (p<0·05). We examined the proportion of false positive results and clinically important overestimates of treatment effects that resulted from the two approaches.
Results: Using the random-effects model, 21 of 33 final cumulative meta-analyses were significant using p<0·05. False positive interim results were observed in 3 of 12 non-significant meta-analyses (25%, 95% CI 0·5-49·5%). The monitoring boundaries eliminated all false positives. Clinically important overestimates were observed in 8 of 21 significant meta-analyses using the conventional criterion (36·8%, 95%CI 15·2%-58·4%) and none of the 21 using the monitoring boundaries.
Discussion: Evaluating statistical inference with sequential monitoring boundaries when meta-analyses fall short of a required sample size may reduce the risk of false positive results and inflated effect sizes.
Methods: We analysed 33 meta-analyses that had a sufficient sample size to detect a realistic treatment effect. We successively monitored the results of the meta-analyses by generating interim cumulative meta-analyses after each included trial and evaluated statistical significance after each using both two-sided O'Brien-Fleming monitoring boundaries and the conventional criterion (p<0·05). We examined the proportion of false positive results and clinically important overestimates of treatment effects that resulted from the two approaches.
Results: Using the random-effects model, 21 of 33 final cumulative meta-analyses were significant using p<0·05. False positive interim results were observed in 3 of 12 non-significant meta-analyses (25%, 95% CI 0·5-49·5%). The monitoring boundaries eliminated all false positives. Clinically important overestimates were observed in 8 of 21 significant meta-analyses using the conventional criterion (36·8%, 95%CI 15·2%-58·4%) and none of the 21 using the monitoring boundaries.
Discussion: Evaluating statistical inference with sequential monitoring boundaries when meta-analyses fall short of a required sample size may reduce the risk of false positive results and inflated effect sizes.