Measuring and reporting of statistical heterogeneity in reviews of diagnostic accuracy studies

2013 Québec City

Ochodo EA¹, Leeflang MMG¹, van Enst WA², Hooft L³, de Groot JA⁴, Bossuyt PM¹, Moons KGM⁴, Reitsma JB⁴

¹Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands

²Dutch Cochrane Centre, Academic Medical Center, The Netherlands

³Dutch Cochrane Centre, Academic Medical Center, The Netherlands;

⁴Julius Center, University Medical Center, Utrecht, The Netherlands

Background: In the majority of diagnostic reviews there is more variability in accuracy measures than can be expected due to chance alone. As a variety of approaches exist for how reviewers examine, measure, report and interpret their results in such circumstances, more guidance is urgently needed.

Objectives: To describe the methods currently used in diagnostic reviews to visualize, quantify, and report statistical heterogeneity in accuracy results between primary studies and to explore how the results of this examination influence subsequent analysis decisions and formulation of conclusions.

Methods: Systematic reviews on diagnostic tests published in MEDLINE-indexed journals between May and September 2012 were identified using a systematic search. Using a standardized form, information was extracted on the clinical context and methods applied from themainmeta-analysis in each review.

Results: 53 meta-analyses met inclusion criteria. These meta-analyses contained a median of 14 primary studies (IQR = 9.5–20.5). Statistical tests for heterogeneity were used in only 72% of the meta-analyses. The most common tests were I2 (29), followed by χ2 (26), and τ2 (5). Heterogeneity was represented visually in all but 5 studies; 40 plotted sensitivity and specificity in ROC space and 34 presented forest plots. Data on how the investigation of statistical heterogeneity influenced subsequent analysis decisions (i.e. whether to investigate sources of heterogeneity) and the formulation of conclusions will be available before the colloquium.

Conclusions: The exploration of statistical heterogeneity in diagnostic accuracy meta-analyses is increasing, although not yet universal. However, there is a lack of consistency in which heterogeneity tests are used, how these tests are interpreted, and how these results influence subsequent analysis decisions and conclusions. In a diagnostic meta-analysis, because mean values are difficult to interpret and translate to clinical practice and because confidence intervals and ellipses do not accurately reflect the amount of between-study variation, identifying sources of variability becomes important.