How do systematic review users and producers interpret the stability of review findings based on GRADE quality of evidence ratings?

2014 Hyderabad

Thaler K¹, Sommer I², Dobrescu AI³, Swinson Evans T⁴, Lohr K⁴, Gartlehner G⁴

¹Austrian Cochrane Branch, Austria

²Danube University Krems, Department for Evidence-based Medicine and Clinical Epidemiology, Austria

³Victor Babes University of Medicine and Pharmacy, Timisoara, Romania

⁴RTI International, Research Triangle Park, NC, USA

Background:
The GRADE (Grading of Recommendations Assessment, Development and Evaluation) approach uses information about study limitations, imprecision, inconsistency, indirectness, and publication bias to determine quality of evidence (QoE) and communicate the confidence that systematic reviewers have in the estimate of effect size. Semantically, key elements of QoE definitions include the concepts of truth, confidence (in effect estimates), modifiers of levels of confidence (e.g. very, moderately, or limited), and deficiencies.

Objectives:
Review authors (producers) and readers (users) may interpret terms intended to convey certainty or stability of results differently. We sought to determine the degree of stability of effects over time that review users and/or producers associate with QoE grades.

Methods:
In an anonymous web-based survey participants used an interactive graphical sliding scale (0% to 100%) to indicate their interpretation of the degree of certainty that future results would NOT substantially change the estimated effect given 'high', 'moderate' or 'low' QoE.

Results:
208 people provided data: 82 (39%) identified as producers, 49 (24%) as users, and 77 (37%) as both users and producers of systematic reviews (SRs). Overall, SR users and producers assigned similar likelihoods that treatment effects will remain stable (P value 0.29), although the variation of answers within groups was large. Fig 1 illustrates the ranges of responses and skewed results for high and low QoE in all three groups. For all groups combined, the mean (SD) for the “estimate that high QoE will remain stable as new studies emerge” was 86.0% (8.2). For moderate QoE the pooled estimate of stability was 61.0% (11.8); for low QoE 34.8% (14.5).

Conclusions:
This study shows that variability in the interpretation of GRADE QoE ratings exists; however the differing interpretation is not between users and producers of SRs. The wide range of associated likelihoods indicates a need for discussion about the meaning behind the definitions of QoE. Furthermore, future studies could test the predictive validity of the GRADE approach in real-world bodies of evidence.