The effect of incorporating RobotReviewer suggestions into risk-of-bias assessments conducted within Covidence

2017 Cape Town [Global Evidence Summit]

Arno A¹, Elliott J², Thomas J³, Wallace B⁴, Marshall I⁵

¹Covidence

²School of Public Health and Preventive Medicine, Monash University

³Institute of Education, University College London

⁴College of Computer and Information Science, Northeastern University

⁵Department of Primary Care and Public Health Sciences, King's College London

Background: Machine learning in health-evidence synthesis is moving forward rapidly. As these technologies mature and become more widely available, it is essential that their effect on accuracy and efficiency is rigorously assessed. Covidence is an online platform that streamlines completion of systematic review tasks, including title/abstract screening, full text review, quality assessment (Risk of Bias, RoB), and data extraction. RobotReviewer is a web-based tool which uses machine learning to semi-automate specific tasks in evidence synthesis, including RoB on user-uploaded PDFs.

Objectives: The purpose of this experiment was to determine the effect of incorporating the suggestions of the RobotReviewer machine learning algorithms into RoB assessments conducted within Covidence (experimental), when compared to human-only, conventional RoB assessment (control).

Methods: We randomised studies (1:1) included within systematic reviews to semi-automated or human-only RoB assessment. In the experimental condition, one of two reviewers was presented with RobotReviewer suggestions (judgement and supporting text) and then asked to complete their assessment. In the control condition, two reviewers completed their assessments without RobotReviewer suggestions. Main outcomes were time to complete assessments (efficiency) and differences between semi-automated and human-only assessments (accuracy).

Results: The results of the randomised study described above will be presented, including the main outcomes of effect on time to complete assessments and assessment accuracy.

Conclusions: Results of this study will contribute to our understanding of the potential benefits and disadvantages of RobotReviewer-generated RoB assessments, and more generally the use of machine learning in the extraction tasks of a systematic review. Rigorous assessments of new, semi-automated evidence systems will form a foundation for effective and appropriate use of emerging technologies.