Article type
Abstract
Background: Machine learning (ML) algorithms have proven highly accurate for identifying randomised-controlled trials (RCTs), but string-based study-design filters remain the predominant approach used in practice for systematic reviews and guidelines.
Objectives: We compared the performance of ML models for identifying RCTs against a range of traditional database study-design filters, including the Cochrane Highly Sensitive Search Strategy (HSSS) and the PubMed publication type tag.
Methods:We evaluated Support Vector Machines (SVMs), Convolutional Neural Networks (CNNs), and ensemble approaches. We trained these models on titles and abstracts labelled as part of the Cochrane Crowd project. We evaluated the models on the Clinical Hedges dataset, which comprises 49 028 articles manually labeled (based on full texts).
Results: ML discriminates between RCTs and non-RCTs better than widely used traditional database search filters at all sensitivity levels (see Figure); our best-performing model achieved the best published results to date for ML in this task (Area under the Receiver Operating Characteristics curve 0.987, 95% CI 0.984 to 0.989). The best performing model (a hybrid SVM model incorporating information from the PT tag) improved specificity compared with the Cochrane HSSS search filter, with identical sensitivity (difference in specificity +10.8%, 95% CI 10.5% to 11.2%), which corresponds to a precision of 21.0% versus 12.5%, and a number “needed to screen” of 4.8 versus 8.0. We have made software implementing these ML approaches freely available under the GPL v3.0 license (at https://github.com/ijmarshall/robotsearch).
Conclusions: ML performs better than traditional database filters, with improved specificity at all sensitivity levels. We recommend that users of the medical literature move toward using ML as the method for study-design filtering.
Objectives: We compared the performance of ML models for identifying RCTs against a range of traditional database study-design filters, including the Cochrane Highly Sensitive Search Strategy (HSSS) and the PubMed publication type tag.
Methods:We evaluated Support Vector Machines (SVMs), Convolutional Neural Networks (CNNs), and ensemble approaches. We trained these models on titles and abstracts labelled as part of the Cochrane Crowd project. We evaluated the models on the Clinical Hedges dataset, which comprises 49 028 articles manually labeled (based on full texts).
Results: ML discriminates between RCTs and non-RCTs better than widely used traditional database search filters at all sensitivity levels (see Figure); our best-performing model achieved the best published results to date for ML in this task (Area under the Receiver Operating Characteristics curve 0.987, 95% CI 0.984 to 0.989). The best performing model (a hybrid SVM model incorporating information from the PT tag) improved specificity compared with the Cochrane HSSS search filter, with identical sensitivity (difference in specificity +10.8%, 95% CI 10.5% to 11.2%), which corresponds to a precision of 21.0% versus 12.5%, and a number “needed to screen” of 4.8 versus 8.0. We have made software implementing these ML approaches freely available under the GPL v3.0 license (at https://github.com/ijmarshall/robotsearch).
Conclusions: ML performs better than traditional database filters, with improved specificity at all sensitivity levels. We recommend that users of the medical literature move toward using ML as the method for study-design filtering.