Can generalizability be considered in systematic reviews of diagnostic test accuracy?

2011 Madrid

Scheibler F¹, Janssen I¹, Schr"oer-G"unther M¹, Sauerland S¹

¹Institute for Quality and Efficiency in Health Care, Germany

Background: Most diagnostic tests consist of variable components, and very often the technical quality of the diagnostic device or the experience of medical staff play an important role for test accuracy. Therefore these factors should be considered carefully when results of studies of diagnostic test accuracy (DTA) are synthetized in systematic reviews (SRs). To our knowledge no systematic and validated method for extracting and assessing variations in the generalizability of diagnostic tests has been published so far (compare section9.2.2 in the Cochrane Handbook on DTA).

Objectives: To develop a multi-item checklist on the generalizability of DTA studies.

Methods: Generalizability aspects were retrieved from the Cochrane Handbook, methodological papers on quality assessment of diagnostic studies and method sections of identified SRs in an overview of SRs on positron emission tomography. An extraction sheet was developed consisting of 7 items and a summary assessment item and tested in 7 SRs with 160 included primary DTA studies.

Results: Due to the comprised format and the clear-cut items the instrument proved to be feasible for application within a systematic review. Limitations to be considered are: i) generalizability differs with different health care settings; ii) generalizability cannot be proven, but it is possible to find evidence for non-generalizability; iii) in several aspects there seems to be a trade-off between generalizability and internal validity (e.g. blinding); and iv) as only few studies were categorized as non-generalizable in our feasibility study, sensitivity analyzes did not show differences in diagnostic accuracy depending on generalizability.

Conclusions: Aspects of generalizability should be assessed in a systematic and transparent way, separately from aspects of bias. We demonstrate the current state of our instrument and very much appreciate comments and suggestions for its improvement.