Evaluation of the priority ranking capabilities of SWIFT (Sciome Workbench for Interactive, Computer-Facilitated Text-mining) software

Article type
Authors
Walker V1, Holmgren S1, Thayer K1, Rooney A1, Macleod M2, Currie G2, Sena E2, Sherratt N2, Rice A3, Howard B4, Shah R4, Pelch K1
1National Institute of Environmental Health Sciences (NIEHS)/National Institutes of Health (NIH), USA
2Centre for Clinical Brain Sciences, University of Edinburgh, Scotland
3Department of Surgery and Cancer, Imperial College, England
4SciOme LLC, USA
Abstract
Background: There is growing interest in assessing the ability of machine learning approaches to priority rank studies as a way to reduce the human burden in screening literature when conducting a systematic review.
Objectives: To assess the performance of Sciome Workbench for Interactive, Computer-Facilitated Text-mining (SWIFT) priority ranking algorithm to identify studies considered relevant based on manual screening.
Methods: Four case studies representing a range of complexity and size were used to assess the performance of SWIFT: 1) transgenerational inheritance of disease, 2) bisphenol A (BPA) and obesity 3) perfluorooctane sulfonate/perfluorooctanoic acid (PFOS/PFOA) and immunotoxicity, and 4) neuropathic pain. For each case study two independent reviewers manually screened results to determine study relevance and identify test sets of 30 to 400 included and excluded references. The test sets were used to priority rank the literature search results in SWIFT for relevance using an algorithm that considers term frequency (title, abstract, MeSH headings and SuppChem annotations) and Latent Dirichlet Allocation (LDA) topic modeling. This ranking was evaluated with respect to 1) number of studies that needed to be screened in order to identify 90% and 95% of known relevant based on manual screening, and 2) the 'Work Saved over Sampling' (WSS) performance metric, which defines, for a specific level of recall, the percentage reduction in effort achieved by a ranking method compared to a random ordering of the documents.
Results: For all four datasets, using 100 training examples and LDA topic modeling, the prioritization procedure reduces the number of citations that must be screened to achieve a recall rate of 90% (Table 1) by 50% or more. For the more stringent recall rate of 95%, the range in the number of citations screened was reduced by 44%, for the neuropathic pain dataset, to 80%, for the PFOS/PFOA dataset. The greatest increases in screening efficiency were observed in the more targeted topics.
Conclusions: Text-mining and machine learning programs such as SWIFT can be valuable tools to reduce the human screening burden.