Reliability and applicability of the revised Cochrane 'Risk of bias' tool for randomized trials (RoB 2)

2019 Santiago

Minozzi S¹, Cinquini M², Gianola S³, Gonzalez-Lorenzo MG⁴, Banzi R⁵

¹Cochrane Review Group on Drugs and Alcohol, Department of Epidemiology, Lazio Regional Health Service

²Unit of Methodology of Systematic reviews and Guidelines development IRCCS-Istituto di Ricerche Farmacologiche Mario Negri

³Unit of Clinical Epidemiology, IRCCS Galeazzi Orthopedic Institute

⁴Department of Biomedical Sciences, Humanitas University, IBD Center, Humanitas Clinical and Research Center

⁵Center for Health Regulatory Policies IRCCS- Istituto di Ricerche Farmacologiche Mario Negri

Background: the revised Cochrane 'Risk of bias' tool for randomized trials (RoB 2) has been proposed to assess the risk of bias in randomized controlled trials (RCTs) included in systematic reviews. RoB 2 is structured into a fixed set of five domains of bias and a series of signalling questions about features of the trial that are relevant to risk of bias. Judgment can be 'low' or 'high' risk of bias, or can express 'some concerns'. Cochrane plans to pilot RoB 2 in 2019 and possibly recommend it for use in Cochrane Reviews in 2020. No studies have been conducted so far to assess RoB 2 reliability and applicability.

Objectives: to measure the inter-rater reliability (IRR) of RoB 2 and explore its applicability.

Methods: four raters with medium to high expertise in risk of bias assessment of RCTs will independently apply the RoB 2 to a random sample of 70 RCTs. The raters were advised to follow the RoB 2 guidance (version October 2018) and did not perform a calibration exercise before the assessment. Before applying the tool, they agreed on: 1) nature of the effect of interest for each trial (assignment or adhering to the intervention); 2) outcome of interest (usually the study’s primary outcome).

We calculated Fleiss’ k for multiple raters for signalling questions, individual domains and overall risk of bias. We classified agreement as poor (≤ 0.00), slight (0.01 to 0.20), fair (0.21 to 0.40), moderate (0.41 to 0.60), substantial (0.61 to 0.80), almost perfect (0.81 to 1.00). The time to complete RoB 2 was calculated as mean of the time spent in minutes by each rater on each study

Results: we report preliminary results on the first 30 RCTs. The agreement was fair for the overall risk of bias (0.27, standard deviation 0.06), as well as for domains 1, 4, 5; the agreement was slight for domains 2 and 3 (Figure). The mean time to complete RoB 2 was 34 minutes (standard deviation 17). The subgroup analysis of RCTs classified as 'aimed to assess the assignment or adherence to the intervention' revealed similar agreement in the overall 'Risk of bias' judgment but important differences in the agreement for domains 2, 4, and 5 (Table).

Conclusions: this preliminary analysis showed a fair agreement for the RoB 2 and highlighted some difficulties in the assessment, as for instance in the definition of the nature of the effect of interest and interpretation of questions in domain 2 (deviations from intended intervention. The application of RoB 2 required a significant amount of time, likely greater than that needed with the previous tool. The application appears to be complex and requires a sound background in the methodology of clinical trials and epidemiology. The assessment of the remaining RCTs is ongoing: the results on the total sample will be presented.

Patient or healthcare consumer involvement: the project focuses on methods to assess risk of bias, so we could not involve consumers.