Reliability of risk of bias assessments between pairs of reviewers from different centers and reasons for disagreement

Article type
Authors
Hartling L1, Hamm M1, Milne A1, Vandermeer B1, Santaguida PL2, Ansari M3, Tsertsvadze A3, Hempel S4, Shekelle P4, Dryden DM1
1University of Alberta, Canada
2McMaster University, Canada
3Ottawa Hospital Research Institute, Canada
4RAND Corporation, USA
Abstract
Background: The Cochrane Risk of Bias (RoB) tool was released in 2008 and is used in systematic reviews to assess the risk of bias of randomized trials (RCTs). A number of studies examining inter-rater reliability of the tool have called for more detailed guidance.

Objectives: (1) assess the reliability of the ROB tool between consensus assessment of individual raters from four centers; (2) examine the impact of study-level factors on reliability; (3) examine reasons for discrepancies.

Methods: Two reviewers independently assessed risk of bias for 154 RCTs. We assessed whether study-level factors influenced inter-rater reliability using subgroup analyses. For a subset of 30 RCTs, two reviewers from each of four Evidence-based Practice Centers assessed risk of bias and reached consensus. Inter-rater agreement between consensus assessments was assessed using kappa statistics. We assessed reasons for disagreement.

Results: Inter-rater variability was influenced by study-level factors. For example, for blinding, inter-rater agreement was better for objective than subjective outcomes. For allocation concealment, agreement was better for trials with parallel versus other designs. Inter-rater reliability of consensus assessments across four reviewer pairs was lower than between individual raters. Reliability was moderate for sequence generation, fair for allocation concealment and ‘other sources of bias’, and slight for the remaining domains. Inter-rater variability resulted more often from differences in interpretation rather than different information identified in the study reports. As an example, all four pairs found the same description for blinding (open-label, blinded outcome assessor); two reviewer pairs rated the domain low, one pair rated it as unclear, and one rated it as high risk of bias.

Conclusions: This study provides new information about the Cochrane ROB tool by comparing consensus assessments across pairs of reviewers. In-depth analysis of the reasons for disagreement provides information for development of more detailed guidance regarding application of the tool.