Article type
Year
Abstract
Background:
Machine learning could improve the efficiency of conducting literature reviews by automating review of titles and abstracts for relevant articles.Objectives:
This study investigated if it is feasible to use the Fisher classification method to distinguish between relevant and irrelevant articles for literature reviews based on their titles and abstracts, and to determine if success varies depending on the type of classification being performed.Methods:
Datasets were created from abstract lists from three systematic reviews. To explore if the algorithm performed better on particular types of classification (by study design, by outcomes measured, by interventions used or by disease area), decisions at various points of the eligibility flowcharts were tested separately. Articles were labelled as ‘relevant’ or ‘irrelevant’ at each of these stages. The datasets were processed to remove duplicates and to adjust for imbalances in ‘relevant’:‘irrelevant’ abstracts as a possible confounder. Articles with only a title or only an abstract were retained.Each dataset was divided into training (60%), cross-validation (20%) and test sets (20%). Accuracy was measured using classification accuracy and the F2 score which favours correct classification of ‘relevant’ items. After training the classifier algorithm, we optimised the F2 score on the cross-validation set by adjusting the thresholds at which a ‘relevant’ or ‘irrelevant’ label was assigned. Items falling below these thresholds were marked as ‘uncertain’ by the algorithm and excluded from the F2 score calculation. The final F2 score was calculated on the test set.