Risk of bias versus quality assessment in systematic reviews: a comparison between ROBIS and AMSTAR

Article type
Authors
Minozzi S1, Cinquini M2, Capobussi M3, Gonzalez-Lorenzo M3, Pecoraro V4, Banzi R5
1Cochrane Review Group on Drugs and Alcohol, Department of Epidemiology, Lazio Regional Health Service, Via Cristoforo Colombo, 112, 00147 – Rome
2Laboratory of Clinical Research Methodology, IRCCS-Mario Negri Institute for Pharmacological Research, Via G. La Masa 19, 20156 Milan
3Department of Biomedical Sciences for Health, University of Milan, Via Pascal 36, 20133 Milan
4Department of Laboratory Medicine and Pathological Anatomy, Laboratory of Toxicology. Ospedale Civile S. Agostino Estense, Azienda USL of Modena
5Laboratory of Regulatory Policies IRCCS-Mario Negri Institute for Pharmacological Research, Via G. La Masa 19, 20156 Milan
Abstract
Background: Systematic reviews (SRs) are widely used to support the development of clinical guidelines and other documents driving decisions in healthcare. Suboptimal SRs can be harmful and a reliable assessment of their validity is essential. A widely used tool is the AMSTAR checklist, while the ROBIS tool was recently launched to specifically assess risk of bias of SRs.

Objectives: To evaluate the inter-rater reliability (IRR) of AMSTAR and ROBIS for individual domains and overall judgment, the concurrent validity, and the time required to apply the tools.

Methods: Five raters with different levels of expertise assessed 31 SRs on pharmacological thromboprophylaxis using AMSTAR and ROBIS. For each question, domain and overall risk of bias, we calculated the Fliess’ k for multiple IRR (for AMSTAR, low risk of bias: eight yes-answers or more, high risk of bias: three yes-answers or less). We assessed the concurrent validity of the two tools by comparing different domains addressing similar items (Table). We recorded the time to complete each tool as mean time spent by each reviewer on each review. We classified agreement as: poor (≤0.00), slight (0.01-0.20), fair (0.21-0.40), moderate (0.41-0.60), substantial (0.61-0.80), almost perfect (0.81-1.00).

Results: The kappa for the agreement on individual domains ranged from 0.28 to 1 for AMSTAR and from 0.49 to 0.61 for ROBIS; kappa for overall risk of bias was 0.65 for both tools (Figure). We found a fair correlation between AMSTAR and ROBIS in the overall judgment (ρ=0.38), mainly because of discordances in the classification of SRs at intermediate risk of bias. The mean time to complete ROBIS was about twice that of AMSTAR (mean±standard deviation: 12.6±4.6 vs. 5.8±31.9; mean difference: 6.7±3.2). Concurrent validity on single domains will be presented.

Conclusions: We found a similar substantial IRR for both tools in the judgment of overall risk of bias. ROBIS requires more time to complete. Reasons for low correlation between AMSTAR and ROBIS may be differences in judgments or genuine differences in what the tools aimed to measure (methodological quality vs. risk of bias and appropriateness).