Evaluation of the priority ranking capabilities of SWIFT (Sciome Workbench for Interactive, Computer-Facilitated Text-mining) software

2015 Vienna

Walker V¹, Holmgren S¹, Thayer K¹, Rooney A¹, Macleod M², Currie G², Sena E², Sherratt N², Rice A³, Howard B⁴, Shah R⁴, Pelch K¹

¹National Institute of Environmental Health Sciences (NIEHS)/National Institutes of Health (NIH), USA

²Centre for Clinical Brain Sciences, University of Edinburgh, Scotland

³Department of Surgery and Cancer, Imperial College, England

⁴SciOme LLC, USA

Background: There is growing interest in assessing the ability of machine learning approaches to priority rank studies as a way to reduce the human burden in screening literature when conducting a systematic review.
Objectives: To assess the performance of Sciome Workbench for Interactive, Computer-Facilitated Text-mining (SWIFT) priority ranking algorithm to identify studies considered relevant based on manual screening.
Methods: Four case studies representing a range of complexity and size were used to assess the performance of SWIFT: 1) transgenerational inheritance of disease, 2) bisphenol A (BPA) and obesity 3) perfluorooctane sulfonate/perfluorooctanoic acid (PFOS/PFOA) and immunotoxicity, and 4) neuropathic pain. For each case study two independent reviewers manually screened results to determine study relevance and identify test sets of 30 to 400 included and excluded references. The test sets were used to priority rank the literature search results in SWIFT for relevance using an algorithm that considers term frequency (title, abstract, MeSH headings and SuppChem annotations) and Latent Dirichlet Allocation (LDA) topic modeling. This ranking was evaluated with respect to 1) number of studies that needed to be screened in order to identify 90% and 95% of known relevant based on manual screening, and 2) the 'Work Saved over Sampling' (WSS) performance metric, which defines, for a specific level of recall, the percentage reduction in effort achieved by a ranking method compared to a random ordering of the documents.
Results: For all four datasets, using 100 training examples and LDA topic modeling, the prioritization procedure reduces the number of citations that must be screened to achieve a recall rate of 90% (Table 1) by 50% or more. For the more stringent recall rate of 95%, the range in the number of citations screened was reduced by 44%, for the neuropathic pain dataset, to 80%, for the PFOS/PFOA dataset. The greatest increases in screening efficiency were observed in the more targeted topics.
Conclusions: Text-mining and machine learning programs such as SWIFT can be valuable tools to reduce the human screening burden.