Article type
Year
Abstract
Background: The Cochrane Risk of Bias (RoB) tool was released in 2008 and is used in systematic reviews to assess the risk of bias of randomized trials (RCTs). A number of studies examining inter-rater reliability of the tool have called for more detailed guidance.
Objectives: (1) assess the reliability of the ROB tool between consensus assessment of individual raters from four centers; (2) examine the impact of study-level factors on reliability; (3) examine reasons for discrepancies.
Methods: Two reviewers independently assessed risk of bias for 154 RCTs. We assessed whether study-level factors influenced inter-rater reliability using subgroup analyses. For a subset of 30 RCTs, two reviewers from each of four Evidence-based Practice Centers assessed risk of bias and reached consensus. Inter-rater agreement between consensus assessments was assessed using kappa statistics. We assessed reasons for disagreement.
Results: Inter-rater variability was influenced by study-level factors. For example, for blinding, inter-rater agreement was better for objective than subjective outcomes. For allocation concealment, agreement was better for trials with parallel versus other designs. Inter-rater reliability of consensus assessments across four reviewer pairs was lower than between individual raters. Reliability was moderate for sequence generation, fair for allocation concealment and ‘other sources of bias’, and slight for the remaining domains. Inter-rater variability resulted more often from differences in interpretation rather than different information identified in the study reports. As an example, all four pairs found the same description for blinding (open-label, blinded outcome assessor); two reviewer pairs rated the domain low, one pair rated it as unclear, and one rated it as high risk of bias.
Conclusions: This study provides new information about the Cochrane ROB tool by comparing consensus assessments across pairs of reviewers. In-depth analysis of the reasons for disagreement provides information for development of more detailed guidance regarding application of the tool.
Objectives: (1) assess the reliability of the ROB tool between consensus assessment of individual raters from four centers; (2) examine the impact of study-level factors on reliability; (3) examine reasons for discrepancies.
Methods: Two reviewers independently assessed risk of bias for 154 RCTs. We assessed whether study-level factors influenced inter-rater reliability using subgroup analyses. For a subset of 30 RCTs, two reviewers from each of four Evidence-based Practice Centers assessed risk of bias and reached consensus. Inter-rater agreement between consensus assessments was assessed using kappa statistics. We assessed reasons for disagreement.
Results: Inter-rater variability was influenced by study-level factors. For example, for blinding, inter-rater agreement was better for objective than subjective outcomes. For allocation concealment, agreement was better for trials with parallel versus other designs. Inter-rater reliability of consensus assessments across four reviewer pairs was lower than between individual raters. Reliability was moderate for sequence generation, fair for allocation concealment and ‘other sources of bias’, and slight for the remaining domains. Inter-rater variability resulted more often from differences in interpretation rather than different information identified in the study reports. As an example, all four pairs found the same description for blinding (open-label, blinded outcome assessor); two reviewer pairs rated the domain low, one pair rated it as unclear, and one rated it as high risk of bias.
Conclusions: This study provides new information about the Cochrane ROB tool by comparing consensus assessments across pairs of reviewers. In-depth analysis of the reasons for disagreement provides information for development of more detailed guidance regarding application of the tool.