Inter-rater reliability of a new instrument for assessing potential for bias in prognosis studies

2011 Madrid

Peterson K¹, Carson S¹, Carney N¹

¹Oregon Evidence-based Practice Center, Oregon Health & Science University, USA

Background: There is no gold standard for assessing the potential for bias in prognosis studies. The best available recommendations come from a 2006 'review of reviews'. However, very little formal testing has been done on applying those recommendations.

Objectives: To assess inter-rater reliability of an instrument for assessing potential for bias in prognosis studies.

Methods: We developed an instrument for assessing the potential for bias in prognosis studies based on the recommendations of Hayden et al. Our instrument consists of 24 items across 6 domains (patient selection methods, prognostic factor measurement, outcome measurement, follow-up, analysis and reporting methods, and measurement of confounders). The potential for bias was rated as low, medium, or high for each domain and for each study overall. Two reviewers independently assessed 37 studies included in a systematic review conducted for the Brain Trauma Foundation’s second edition of Early Indicators of Prognosis for Moderate to Severe Traumatic Brain Injury. Inter-rater reliability was estimated using Cohen’s kappa.

Results: Inter-rater reliability was moderate for the overall potential for bias (weighted Cohen’s kappa: 0.78; 95% Confidence Interval (CI) 0.52 to 1.05). Although observed agreement was generally high across the individual domains (range 74% to 93%), kappa values ranged widely. For example, kappa values indicated 'less than chance’ reliability for the outcome measurement domain ( 0.03 ; 95% CI 0.31 to 0.26), but 'almost perfect’ reliability for the analysis and reporting methods domain (0.83; 95% CI 0.52 to 1.14). Kappa values were lowest for domains having distributions that were skewed toward ratings of low to medium potential for bias.

Conclusions: Our instrument had moderate inter-rater reliability for assessing overall potential for bias in prognosis studies. Before our instrument can be used with confidence, however, more reliability and validity testing is needed on studies representing broader ranges of validity and clinical topic areas.