Increased risks of false-positive or false-negative findings are common in outcomes graded as high certainty of evidence

2018 Edinburgh

Nussbaumer-Streit B¹, Gartlehner G², Wagner G¹, Patel S³, Swinson-Evans T³, Dobrescu AI⁴, Gluud C⁵

¹Cochrane Austria

²Cochrane Austria and RTI International, Research Triangle Park

³RTI International, Research Triangle Park

⁴Genetics Department, Victor Babes University of Medicine and Pharmacy

⁵Copenhagen Trial Unit, Centre for Clinical Intervention Research, Rigshospitalet, Copenhagen University Hospital

Background:
GRADE (Grading of Recommendations Assessment, Development and Evaluation) has become a commonly used tool to convey the certainty of evidence (CoE) in systematic reviews. For decision-makers, such assessments are crucial because they convey the confidence that review authors have in the results. However, previous research has shown that 20% of outcomes graded as high CoE changed substantially as new studies were added. This raises concerns because high CoE, by definition, means that the effect estimate should remain stable when new studies are added to a systematic review. Possible explanations for the limited predictive value of high CoE outcomes could be a lack of adherence to the GRADE guidance, or the conceptual approach to grading CoE, which may not adequately take into consideration the risk of false-positive or false-negative conclusions.

Objectives:
We aimed to identify the factors responsible for the limited predictive value of high CoE grades; specifically, whether an increased risk of type I or type II errors could be the reason.

Methods:
We randomly selected 100 Cochrane Reviews with dichotomous outcomes rated as high CoE using GRADE. To detect increased risks for random errors, two investigators independently conducted Trial Sequential Analysis (TSA) employing conventional thresholds for type I (α = 0.05) and type II (β = 0.10) errors. We dually re-graded all outcomes with increased risks of random errors and conducted multivariate logistic regression analyses to determine predictors of increased risks.

Results:
Overall, 38% (95% confidence interval 28% to 47%) of high CoE outcomes had increased risks of random errors. Outcomes measuring harms were more frequently affected than outcomes assessing benefits (47% versus 12%). Re-grading of outcomes with increased random errors showed that 74% should not have been rated as high CoE based on current guidance. Regression analyses rendered small absolute risk difference (P = 0.009) and low number of events (P = 0.001) as significant predictors of increased risks of random errors.

Conclusions:
Decision-makers need to be aware that outcomes rated as high CoE often have increased risks of false-positive or false-negative findings.

Patient or healthcare consumer involvement:
Assessments of CoE are important for informed decision-making by healthcare consumers and they should be reliable.