Using machine learning as a study design filter for systematic reviews of RCTs

2017 Cape Town [Global Evidence Summit]

Marshall I¹, Noel-Storr A², Kuiper J³, Thomas J⁴, Wallace BC⁵

¹King's College London

²University of Oxford

³Doctor Evidence

⁴Institute of Education, UCL

⁵Northeastern University

Background: Machine learning (ML) algorithms have proven highly accurate for identifying randomised-controlled trials (RCTs), but string-based study-design filters remain the predominant approach used in practice for systematic reviews and guidelines.

Objectives: We compared the performance of ML models for identifying RCTs against a range of traditional database study-design filters, including the Cochrane Highly Sensitive Search Strategy (HSSS) and the PubMed publication type tag.

Methods:We evaluated Support Vector Machines (SVMs), Convolutional Neural Networks (CNNs), and ensemble approaches. We trained these models on titles and abstracts labelled as part of the Cochrane Crowd project. We evaluated the models on the Clinical Hedges dataset, which comprises 49 028 articles manually labeled (based on full texts).

Results: ML discriminates between RCTs and non-RCTs better than widely used traditional database search filters at all sensitivity levels (see Figure); our best-performing model achieved the best published results to date for ML in this task (Area under the Receiver Operating Characteristics curve 0.987, 95% CI 0.984 to 0.989). The best performing model (a hybrid SVM model incorporating information from the PT tag) improved specificity compared with the Cochrane HSSS search filter, with identical sensitivity (difference in specificity +10.8%, 95% CI 10.5% to 11.2%), which corresponds to a precision of 21.0% versus 12.5%, and a number “needed to screen” of 4.8 versus 8.0. We have made software implementing these ML approaches freely available under the GPL v3.0 license (at https://github.com/ijmarshall/robotsearch).

Conclusions: ML performs better than traditional database filters, with improved specificity at all sensitivity levels. We recommend that users of the medical literature move toward using ML as the method for study-design filtering.