Challenges in the assessment of heterogeneity in systematic reviews of diagnostic test accuracy studies

2010 Keystone

van-der-Windt D¹, Reitsma J, Jellema P, de-Vet H²

¹Arthritis Research UK National Primary Care Centre, Keele University, Keele, UK

²Epidemiology & Biostatistics, EMGO+Institute, Amsterdam, Netherlands

Background: The results of diagnostic accuracy studies (sensitivity and specificity) often show wide variation. Estimates of I2 or the Q test, commonly used in meta-analysis of randomized trials, can be inaccurate when the number of studies is small, and may not be helpful for diagnostic meta-analysis, as heterogeneity of both sensitivity and specificity should be assessed simultaneously, taking into account the correlation that might exist between sensitivity and specificity. Objectives: To discuss the difficulties surrounding the assessment of heterogeneity in diagnostic meta-analysis. Methods: We recently carried out a series of systematic reviews on the diagnostic performance of symptoms, signs, and laboratory tests in the identification of colorectal disease. Quality assessment using the QUADAS tool, and data extraction was performed by two reviewers independently. We presented pooled estimates of sensitivity and specificity using the bivariate random effects approach, but refrained from pooling when there was considerable clinical or statistical heterogeneity. Results: Heterogeneity was partly explained by differences in study design (e.g. cohort or nested case control design), sources of bias (e.g. verification bias), and prevalence of disease, but there was wide unexplained heterogeneity in results. Different approaches for assessing statistical heterogeneity were applied, exploring their influence on the decision to present pooled estimates. These approaches include: observation of forest plots; use of statistical tests or I2; use of a priori defined cuts-off for maximal variation in point estimates or for optimal values of diagnostic performance; selection of other measures of performance; analysis of between-study variation (τ 2), and observation of the prediction ellipse. Discussion: Results of the analyses will be presented during the conference using illustrative examples. Recommendations will be given regarding approaches that may facilitate the assessment of (statistical) heterogeneity in diagnostic systematic reviews, and may help decisions regarding the presentation of pooled estimates of results.