Evaluation of risk of bias in non-randomized trial using the ROBINS-I tool: a pilot experience

2018 Edinburgh

Minozzi S¹, Cinquini M², Castellini G³, Gianola S⁴, Gerardi C⁵, Banzi R⁵

¹Department of Epidemiology, Lazio Regional Health Service– Rome

²IRCCS-Istituto di Ricerche Farmacologiche Mario Negri– Milano

³Department of Biomedical Sciences for Health, University of Milan, Milan

⁴IRCCS Galeazzi Orthopedic Institute, Milan

⁵IRCCS- Istituto di Ricerche Farmacologiche, Milano

Background: The number of systematic reviews including non-randomized studies (NRS) is increasing, thus the evaluation of NRS validity is critical. The 'risk of bias in non-randomized studies of intervention' (ROBINS-I), published in 2016, is gaining popularity. No studies have been conducted so far to assess its reliability.
Objectives: To measure the inter-rater reliability (IRR) of ROBINS-I and explore its applicability.
Methods: Taking a systematic review on influenza vaccination as case model, we applied the ROBINS-I-stage 2 (definition of: target trial, confounding, co-morbidities, effect of interest). Five raters with low-medium expertise in risk of bias assessment of NRS independently read 14 cohort studies and applied the ROBINS-I-stage 3 on two outcomes: influenza-like illness (ILI, subjective), laboratory-confirmed influenza (objective).
We calculated Fliess’ k for multiple raters for signalling questions, individual domains and overall risk of bias, after a round of discussion aimed at clarifying some critical aspects of the tools (e.g. conditional questions). We classified agreement as poor (≤ 0.00), slight (0.01 to 0.20), fair (0.21 to 0.40), moderate (0.41 to 0.60), substantial (0.61 to 0.80), almost perfect (0.81 to 1.00). We calculated time to complete the tool as mean of the time spent in minutes by each rater on each study.
Results: Six studies evaluated ILI, four influenza, and four both outcomes. Table 1 reports the IRR: agreement was poor/slight for all the individual domains. IRR for the overall risk of bias was slight for the subjective outcome ILI (0.24, SD 0.07) and poor for objective outcome influenza (-0.06, SD 0.08). The mean time to complete ROBINS-I was 36.2 minutes (SD 12.9).
Conclusions: The agreement ranged from poor to slight. We found the tool difficult to apply, mainly because of the ambiguity of conditional questions, i.e. when the answer to trigger questions is 'no information'. Unclear reporting of several studies increased the poor agreement. The small sample and the use of studies not adequate to assess bias due to post-intervention deviations limit our findings. We will present results on 15 additional NRSs where adherence to the assigned interventions may introduce biases.
Patient or healthcare consumer involvement: The project focuses on methods to assess risk of bias, so we could not involve consumers