Treatment of multiple test readers in diagnostic accuracy systematic reviews of imaging studies

2016 Seoul

McGrath T¹, McInnes M², Langer F³, Hong J¹, Korevaar D⁴, Bossuyt P⁴

¹Faculty of Medicine, University of Ottawa, Canada

²Department of Radiology, University of Ottawa, Canada

³Faculty of Medicine, Federal University of Santa Maria, Brazil

⁴Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, The Netherlands

Background: Studies of diagnostic accuracy of imaging tests often contain multiple readers of the index test. This is done to assess inter-observer variability, or to examine the impact of reader experience on test accuracy. Multiple readers can pose unique challenges in diagnostic accuracy systematic reviews of imaging studies. Guidance for handling multiple readers in such reviews currently does not exist.
Objectives: To evaluate the handling of multiple readers in diagnostic accuracy systematic reviews of imaging studies.
Methods: MEDLINE was searched for systematic reviews published in imaging journals between Jan 2005 and May 2015 that performed meta-analysis of diagnostic accuracy data. Handling of multiple readers was extracted and classified. We determined the incidence and reporting of multiple reader data in primary diagnostic accuracy studies from a random 10% subset of included reviews.
Results: 28/296 (9.5%) included reviews specified how multiple readers were handled: 7/28 averaged the results from multiple readers within a primary study, 2/28 included only the best reader, 14/28 treated each reader as a separate data set, 1/28 randomly selected a reader, and 4/28 used another strategy. A sample of 27/268 of reviews that did not report methods for handling multiple readers yielded 442 primary studies. 270/442 (61%) primary studies had multiple readers: 164/442 (37%) reported consensus reading, 87/442 (20%) reported inter-observer variability statistics, and 9/442 (2%) reported independent datasets for each reader. Of these reviews, 26/27 (96%) contained at least one primary study with multiple readers, and 8/27 (30%) contained at least one primary study with independent data sets for multiple readers.
Conclusions: Reporting how multiple readers from primary studies were treated in systematic reviews of imaging is uncommon. When reported, strategies vary widely; this is likely related to the lack of guidance and the lack of an optimal statistical method. Until such methods are developed, authors are encouraged to report the method used to analyze multiple readers so that the potential bias introduced by their chosen strategy is apparent.