Inter-rater agreement and time to complete the new Cochrane Risk-of-Bias tool (RoB 2.0)

2017 Cape Town [Global Evidence Summit]

Minozzi S¹, Saulle R¹, Mitrova Z¹, Cinquini M²

¹Department of Epidemiology, Cochrane Review Group on Drugs and Alcohol, Lazio Regional Health Service

²IRCCS-Mario Negri Institute for Pharmacological Research

Background:The RoB 2.0 tool, a revised tool to assess risk of bias in randomised trials (RCTs) was piloted during 2016 and officially released at the 2016 Cochrane Colloquium.

Objectives:To assess the Inter-rater agreement (IRR) between raters, time to retrieve protocols and to complete the RoB 2.0 tool.

Methods:We used a convenience sample of 20 individually parallel RCTs included in 2 Cochrane reviews in the drug and alcohol-addiction field. Nine studies compared pharmacological intervention versus placebo and 11 compared psychosocial intervention versus no intervention or usual care.
Two raters with medium and high expertise in risk-of-bias assessment were involved. For each relevant outcome we used the Cohen's weighted κ to assess the IRR for signaling questions (SQ), individual domain judgments (DJ) and overall judgment (OJ). We classified agreement as poor (≤0.00), slight (0.01-0.20), fair (0.21-0.40), moderate (0.41-0.60), substantial (0.61-0.80), almost perfect (0.81-1.00).
Time to complete the tool was calculated as the mean time spent in minutes by each rater for each relevant outcome.
Time to search and acquire the study protocol was calculated as the mean time spent in minutes for each trial.

Results:Preliminary results of the first 6 outcomes from 4 trials are provided.
Randomisation process: SQ1.1: k0.57, SQ1.2: k0.57, SQ1.3: k0.18; DJ1: k0.08
Deviations from intended interventions: SQ2.1: k0.45, SQ2.2: k0.45, SQ2.3: k-0.36, SQ2.4: k0, SQ2.5: k0, DJ2: k-0.36
Missing outcome data: SQ3.1: k0.57, SQ3.2: k-0.13, SQ3.3: k0.20, DJ3: k1
Measurement of the outcome: SQ4.1: k0.18, SQ4.2: k0.36, DJ4: k0.67
Selection of the reported results: SQ5.1: k-1, SQ5.2k-1, DJ5: K0
Overall judgment: K0
Mean time to complete the tool was 34.2 minutes; mean time to search for protocols was 20 minutes.

Conclusions: Preliminary results showed an agreement from poor to moderate for signaling questions, from slight to almost perfect for judgments on individual domains and a poor agreement for overall judgments.