Uncertainty of heterogeneity in meta-analyses

Article type
Authors
A Patsopoulos N, PA Ioannidis J, Evangelou E
Abstract
Background: An important aim of systematic reviews and meta-analysis is to understand the extent to which different studies on the same or different topic give similar or dissimilar results. While the reasons for clinical, methodological, and biologic heterogeneity may be topic-specific and need a multifaceted approach for their evaluation each time, statistical examination of heterogeneity may be possible to perform with the same methods in all meta-analyses. Inferences about the clinical importance and generalizability of the results are often considerably affected from the presence or absence of statistical heterogeneity and its extent.
Objectives: To evaluate empirically the extent of uncertainty in I2 estimates.
Methods: We considered meta-analyses of the Cochrane Database of Systematic Reviews (Issue 4, 2005) with 4 or more synthesized studies and binary outcomes. Eventually, we analysed 1,011 eligible meta-analyses. The second dataset was a previously described database of 50 meta-analyses of gene-disease associations that had found a nominally statistically significant effect (p < 0.05) for proposed genetic risk factors. For each meta-analysis we calculated I2 and respective 95% confidence interval. Finally, we evaluated 11 systematic reviews that included randomized trials and that were published in BMJ between July 1, 2005 and January 1, 2006.
Results: The median (IQR) number of studies was 7 (5-11) and 20 (13-26), respectively and the median (IQR) total sample size was 1112 (512-2691) and 4660 (2823-8761), respectively for Cochrane and genetic meta-analyses. The median (IQR) I2 was 21.1% (0%-49.7%) and 37.6% (4.6%-59.5%), respectively, in the two databases. Of the meta-analyses where the I2 is œôù¤25% (little heterogeneity), 83% (448/539) in the Cochrane and 73% (16/22) of the genetic risk factor meta-analyses have upper 95% confidence intervals that cross into the range of large heterogeneity (I2œôù¥50%). Of the meta-analyses where the I2 is œôù¥50% (large heterogeneity), 67% (168/249) in the Cochrane and 52% (11/21) of the genetic risk factor meta-analyses have lower 95% confidence intervals that cross into the range of little heterogeneity (I2œôù¤25%). The uncertainty for the upper 95% confidence interval of I2 for the two large datasets limited to those meta-analyses that have I2=0% (n=373 Cochrane and n=12 genetic) is always larger than 33% in all these meta-analyses. For 81% of the meta-analyses with I2=0%, the 95% confidence intervals extends to I2=50% and higher [81% (303/373) and 83% (10/12) in the two datasets]. Of the 11 reviews published in BMJ, 8 systematic reviews performed quantitative syntheses; one did not test for between-study heterogeneity at all. I2 was measured for at least one data synthesis in 6 of them and all of them performed statistical significance-testing for heterogeneity, apparently using the Q statistic. A total of eight statements were made trying to interpret heterogeneity in the text of these reviews and for 7 of them, sufficient information was provided that we could calculate the 95% confidence interval of I2. The lower 95% confidence interval always went to as low as 0% (rounded to integer percentage) with one exception. The upper 95% confidence interval always exceeded the 50% threshold and in 4/7 cases it also exceeded the 75% threshold.
Conclusions: Under the current research circumstances, in most meta-analyses, the presence of considerable between-study heterogeneity cannot be excluded with confidence. This is an important lesson about the potentially ubiquitous presence of some between-study heterogeneity. Claims for homogeneity may sometimes be stronger than the evidence allows and may lead to spurious certainty about the comparability of study results and the generalizability of treatment effects.