Study designs for comparative diagnostic test accuracy: a methodological review and classification.

Article type
Authors
Yang B1, Olsen M1, Vali Y1, Langendam M1, Takwoingi Y2, Hyde C3, Bossuyt P1, Leeflang M1
1Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Amsterdam UMC, University of Amsterdam
2Test Evaluation Research Group, Institute of Applied Health Research, University of Birmingham
3Exeter Test Group and South West CLAHRC, University of Exeter Medical School
Abstract
Background: Systematic reviews of diagnostic test accuracy (DTA) addressing comparative questions include studies comparing the accuracy of two or more index tests (i.e. comparative DTA studies). Well-conducted comparative DTA studies represent the most reliable evidence for determining the relative accuracy of tests. However, the range of available study designs indicate varying risk of bias, and inconsistent labeling of designs complicate study identification and classification.

Objectives: (1) To examine the variability of comparative DTA study designs and to propose a study design classification scheme; and (2) to describe study design labels used by comparative DTA study authors.

Methods: A methodological review of 100 comparative DTA studies published in 2015, 2016 and 2017. These were randomly sampled from comparative DTA studies included in 238 comparative DTA systematic reviews indexed in MEDLINE in 2017. From each study, we extracted six design features (direction of data collection, number of groups sampled, sampling method, allocation of participants, reference standard and verification of disease status) and labels used by authors.

Results: Most studies (n=57) enrolled a single group of participants, with each participant receiving all index tests. We classified the studies into six study design categories based on how participants were allocated to each index test: ‘paired’ (n=78), ‘partially paired, random subset’ (n=0), ‘partially paired, nonrandom subset’ (n=2), ‘unpaired randomized’ (n=1), ‘unpaired nonrandomized’ (n=3) and ‘externally controlled’ (n=1). The allocation method of 15 studies were unclear. Sixty-one studies reported 33 unique study design labels, but only nine labels conveyed information that there was a test comparison in the study.

Conclusions: Our classification scheme for comparative DTA study designs may help systematic reviewers when assessing risk of bias and interpreting results. In addition, researchers can use the scheme to select optimal designs for future studies. Further work is needed to develop an agreed set of informative labels for comparative DTA studies.

Patient or healthcare consumer involvement: Patients or healthcare consumers were not involved in this study.