Comparison of AMSTAR 2 with ROBIS in systematic reviews including randomized and non-randomized studies

2018 Edinburgh

Pieper D¹, Puljak L², Gonzalez Lorenzo M³, Minozzi S⁴

¹Witten/Herdecke University

²University of Split School of Medicine

³University of Milan

⁴Cochrane Review Group on Drugs and Alcohol, Department of Epidemiology, Lazio Regional Health Service

Background: While the risk of bias in systematic reviews (ROBIS)-tool can be applied to all type of systematic reviews (SR), AMSTAR 2 enables more detailed assessment of systematic reviews that include randomised or non-randomised studies (NRS) of healthcare interventions, or both. Prior research indicates that including NRS into systematic reviews of therapeutic interventions is challenging.
Objectives: To report on a first experience with AMSTAR 2 and compare it with ROBIS when assessing SRs that include both RCTs and NRS while assessing validity, reliability and applicability.
Methods: Four raters assessed 30 randomly selected SRs taken from two samples of former projects. One sample consisted of Cochrane Reviews only, while the other only included non-Cochrane reviews. All SRs were assessed in the same order using AMSTAR 2 first, followed by ROBIS. For each question, domain and overall risk of bias, we calculated the Fleiss’ k for multiple IRR. We recorded the time to complete each tool as mean time spent by each reviewer on each review. We classified agreement as: poor (≤ 0.00), slight (0.01 to 0.20), fair (0.21 to 0.40), moderate (0.41 to 0.60), substantial (0.61 to 0.80), almost perfect (0.81 to 1.00).
Results: All raters assessed 12 SRs. IRR for ROBIS domains ranged from 0.09 to 0.38. IRR for overall risk of bias was fair (0.24, 95% CI 0.16 to 0.60). Median IRR for AMSTAR 2 was 0.49. Slight or poor agreement was obtained for item 4 (search strategy), 8 (adequately detailed description of included studies), 14 (explanation and discussion of heterogeneity) and 16 (conflict of interest at review level). The mean time to complete scoring was similar (AMSTAR 2: 19 minutes versus ROBIS: 17 minutes). However, huge differences were observed across raters. Results for all 30 SRs and for validity will be presented at the Colloquium.
Conclusions: On average IRR was much higher for AMSTAR 2 when compared to ROBIS. Taking into account that ROBIS has always been applied after AMSTAR 2, we assume that scoring for ROBIS takes more time in general. All raters experienced AMSTAR 2 to be satisfactorily applicable to SRs including RCTs and NRS. Some signalling questions in ROBIS were judged to be very difficult to assess.
Patient or healthcare consumer involvement: Due to the methodological character, patient or healthcare consumer involvement was not planned.