Pooling diagnostic publications: watch for outliers!

Article type
Authors
Devillé WL, Dzaferagic A, Bezemer PD, Bouter LM
Abstract
Objectives: The Cochrane Collaboration Workgroup on Diagnostic and Screening Tests developed a criteria-list for the evaluation of the quality of diagnostic research. The impact of the response to these criteria on the pooled estimate for the diagnostic power of a test is still unclear. A systematic review on low-back pain was re-analysed using this specific criteria-list.

Methods: The sensitivity analysis was limited to the 17 publications up to 1992 on radiculopathy caused by disc hernia. Responses to the different criteria assessed by linear regression analysis of the pooled log diagnostic OR.

Results: Data about the test of Laségue are presented. Only 8 studies had enough data to reconstruct sensitivity and specificity. The reference standard was surgery . Sensitivities ranged from 0.81 to 0.98; the pooled sensitivity was 0.90 (95% CI 0.89-0.91) (fixed effect model). Specificities ranged from 0.10 to 0.52 with a pooled specificity of 0.17 (95%CI 0.15-0.20). The geometric mean (xG) of the diagnostic OR was 4.1 (95% CI 2.05-8.1 ; range 1.34 to 39.25) and not associated with the threshold. The xG OR of the studies without verification bias was 2.95 compared to 39.25 (OR = 2.59; 95% CI -3.67 - -1.50). The studies, which mentioned in-/exclusion criteria, had also a prospective study design. They had a xG OR of 2.78 compared to 12.8 for the others (OR = 1.53; 95% CI -3.03 - .023). Studies with a broad spectrum of diseased had a xG OR of 2.31 compared to 7.17 (OR = 1.13; 95% CI -2.60 - .34). The only study with a control group where another reference standard was used, had an OR of 39.25, an outlier. It was the only study with a clear verification bias. Excluding this study from the analysis, resulted in only one nearly significant criterion of broad versus small spectrum: xG OR of 2.31 vs 4.07 (OR = - .56; 95% CI -1.18 - .49).

Discussion: Only a few studies gave data on patients who had no disc hernia on surgery. Study populations were highly referred patients, from whom a very high percentage had disc hernia. In such a population one should not be surprised about the low discriminative power of the Laségue test. The power of this sensitivity analysis looking at differences according the quality criteria is limited. Even so, 4 criteria were significant at p = 0.05. The only study however with a specific non-diseased group, of patients showed also an extremely higher OR. Excluding this specific study resulted in only one nearly significant result. A clear indication that one should careful look for homogeneity and explore reasons for heterogeneity before pooling data for a meta-analysis.