Article type
Abstract
Background: In the context of clinical practice guideline development we conducted a systematic review on patient values and preferences, or how patients value healthcare outcomes, following the GRADE evidence-to-decision framework. Challenges with these systematic reviews arise as a sensitive search strategy results in a large number of citations to screen, so alternative strategies to balance sensitivity and feasibility are needed.
Objectives: To describe our experience of using a machine-learning model to exclude citations for screening in the context of a large systematic review.
Methods:We ran a sensitive search strategy in MEDLINE and EMBASE. We used the Collaboratron™ platform for: the screening in duplicate of a training sample of the search results (records from 2014 to 2016); the development of a machine-learning model to predict the probability of inclusion of a reference; and, the implementation of the model in the remaining records to be screened. For the machine-learning model we arbitrarily used a score of 0.01 (i.e. 1% probability of an article being relevant) to exclude irrelevant records.
Results: From 48 563 records we screened 10 193 in order to create the training set.
The predicted accuracy of the model was 87.5.% sensitivity and 92.3% specificity, which left 2983 records to screen from the remaining 38 370.
Conclusions: The application of a machine-learning model substantially decreased the workload associated with the screening of a very large number of records. This approach might be useful when a small loss of relevant studies is acceptable.
Objectives: To describe our experience of using a machine-learning model to exclude citations for screening in the context of a large systematic review.
Methods:We ran a sensitive search strategy in MEDLINE and EMBASE. We used the Collaboratron™ platform for: the screening in duplicate of a training sample of the search results (records from 2014 to 2016); the development of a machine-learning model to predict the probability of inclusion of a reference; and, the implementation of the model in the remaining records to be screened. For the machine-learning model we arbitrarily used a score of 0.01 (i.e. 1% probability of an article being relevant) to exclude irrelevant records.
Results: From 48 563 records we screened 10 193 in order to create the training set.
The predicted accuracy of the model was 87.5.% sensitivity and 92.3% specificity, which left 2983 records to screen from the remaining 38 370.
Conclusions: The application of a machine-learning model substantially decreased the workload associated with the screening of a very large number of records. This approach might be useful when a small loss of relevant studies is acceptable.