Reliability and applicability of RoB 2: experience within the pilot review “Cannabis and cannabinoids for people with multiple sclerosis”

Article type
Authors
Minozzi S1, Dwan K2, Borrelli F3, Moore T4, Filippini G5
1Cochrane Review Group on Drugs and Alcohol; Department of Epidemiology, Lazio Regional Health Service
2Cochrane Editorial and Methods Department, London
3Department of Pharmacy, School of Medicine and Surgery, University of Naples Federico II, Naples
4Cochrane; Population Health Sciences Department, Bristol Medical School, University of Bristol
5Scientific Director’s Office Fondazione, Istituto di Ricovero e Cura a Carattere Scientifico, Istituto Neurologico Carlo Besta, Milan
Abstract
Background: The revised Cochrane risk-of-bias tool for randomized trials (RoB 2) is being piloted in 13 Cochrane reviews (SRs) and it is recommended for use in new SRs since 2020. In 2019 we assessed the reliability and applicability of RoB 2 on the primary outcome of a random sample of 70 individually randomized parallel-group trials (RCTs) covering very different topics, so limiting the generalizability of our results within a context of a SR.
Objectives: To measure the inter-rater reliability (IRR) of RoB 2 and assess the difficulties and the time required to implement it within a SR.
Methods: Four raters with medium-high expertise in risk of bias assessment of RCTs independently applied RoB 2 to 18 individually randomized parallel-group trials included in the pilot review “Cannabis and cannabinoids for people with multiple sclerosis”. We first performed a calibration exercise on five RCTs. Then we prepared a structured document on how to implement the tool within the SR (how to answer to signalling questions (SQ) considering the types of outcomes and the clinical context). Finally, we applied the tool to the remaining studies included in the SR.
We calculated Fleiss’ k for multiple raters for individual domains and overall risk of bias. We classified agreement as poor (≤0.00), slight (0.01-0.20), fair (0.21-0.40), moderate (0.41-0.60), substantial (0.61-0.80), almost perfect (0.81-1.00). We calculated the IRR separately for the first five studies assessed during calibration and for the remaining studies assessed after calibration. We measured the time to complete RoB 2 as the mean time spent in minutes by each rater on each study. We also measured the mean time in hours spent for the discussion during calibration and the definition of the criteria to answer SQs in our SR).
Results: Preliminary results on the first 5 RCTs are reported. The IRR was poor for overall risk of bias (-0.15), domain 2 (-0.15) and 4 (-0.24), fair for domain 1 (0.30), slight for domain 3 (0.08) and 5 (0.12). The mean time to complete RoB 2 was 168.5 minutes (SD 68.7). The mean time to complete the whole calibration exercise (including the preparation of the document) was about 55 hours over a three-month period.
Conclusions:The analysis on the first 5 RCTs showed poor agreement for the overall RoB and highlighted difficulties in the comprehension and applicability of some SQs, particularly in domains 2 (deviations from the intended interventions), 3 (missing outcome data) and 5 (selection of the reported result). The application of RoB 2 and the completion of calibration exercise required a significant amount of time. The tool appears to be complex and requires a sound background in clinical epidemiology and statistics, as well as a proof knowledge of the subject matter. The results of the assessment on the total sample will be presented. Implication of the use of RoB 2 for the work of the CRGs editorial bases will be discussed.

Patient or healthcare consumer involvement: The project focuses on methods to assess RoB, so we couldn't involve consumers.