Use of machine-learning tools to support efficient study identification in Cochrane reviews: A case study and cost-effectiveness analysis

2017 Cape Town [Global Evidence Summit]

Shemilt I¹, Hollands G², Carter P², Thomas J¹

¹EPPI-Centre, University College London

²Behaviour and Health Research Unit, University of Cambridge

Background: Study identification is a time-intensive phase of systematic-review production and a key driver of the total cost. Machine-Learning (ML) tools have the potential to speed up study identification and reduce manual screening workload, making previously intractable reviews with ‘too many records’ problems more feasible. However ML tools have not previously been deployed in Cochrane reviews.

Objectives: To explore and evaluate the use of ML tools to support efficient study identification in Cochrane reviews.

Methods: A novel, semi-automated screening workflow – incorporating both active learning and topic-modelling tools – was designed and implemented in a Cochrane Public Health review to help identify eligible studies among c. 157 000 unique citations retrieved by electronic searches of 11 databases. Electronic searches were supplemented by extensive searches of other resources. A cost-effectiveness analysis (CEA) was conducted to model and compare: (A) the novel, semi-automated workflow; with (B) a conventional screening workflow; and, (C) a semi-automated workflow incorporating active learning without topic modelling.

Results: Use of the novel, semi-automated workflow (A) reduced manual title-abstract screening workload by 83% in this review, compared with conventional screening (B), without any loss of recall. Topic modelling did not identify any eligible studies. Searches of other resources identified 4 further eligible studies but none were published prior to the date of last search, so were not represented among the c. 157 000 electronic search results. A full set of CEA results will be presented. Prior to having full CEA results, it is clear that the modelled semi-automated workflow incorporating active learning without topic modelling (C) ‘dominates’ the other options (A and B) in this case, i.e. it would cost less, with identical recall.

Conclusions: Use of ML tools can make study identification more efficient in Cochrane reviews that have a ‘too many records’ problem. Further evaluations of ML tools are needed to assess the generalisability of this finding and to help build an evidence base for efficient workflow design in reviews.