Assessing the effectiveness of artificial intelligence tools in automating systematic reviews for cancer research: a systematic review

Article type
Authors
Kumar M1, Miranda A, Saha A, Su E, Sussman J, Yao X
1Mcmaster University, Hamilton, Ontario, Canada
Abstract
Background:
Systematic reviews (SRs) play a crucial role in evidence-based medicine, offering guidance to clinicians to further improve patients’ outcomes. However, the time required for completing SRs, averaging around 67.3 weeks, poses a significant challenge. A literature search, particularly in the title and abstract screening and full-text screening stages, is notably time-consuming.

Objectives:
To assess the accuracy and workload savings of artificial intelligence (AI)-based automation tools compared with human reviewers in medical literature screening for cancer-related SRs to enhance the efficiency of SR production.

Methods:
Medline, Embase, Cochrane Library, and PROSPERO databases were searched from inception to November 30, 2022. Forward and backward literature searches were completed, and the experts in this field were contacted for an exhaustive grey literature exploration. Eligibility criteria included a full-text published English article or a conference abstract that reported any of the following outcomes on any AI tools used for the automation screening of SRs related to cancer topics: the sensitivity and/or specificity of the tool’s automation capacity and/or workload savings outcomes compared with human reviewers. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines were adhered to. This SR was registered on PROSPERO.

Results:
Among the 3947 studies obtained from the search, 5 studies met the preplanned study selection criteria. These 5 studies evaluated 4 AI tools: Abstrackr (4 studies), RobotAnalyst (1), EPPI Reviewer (1), and DistillerSR (1). Abstrackr eliminated 20% to 88% of titles and abstracts (time saving of 7-86 hours) and 59% of the full texts (62 hours) across 4 cancer-related SRs without missing any final citations. In comparison, RobotAnalyst (1% of titles and abstracts, 1 hour), EPPI Review (38% of titles and abstracts, 58 hours; 59% of full texts, 62 hours), and DistillerSR (42% of titles and abstracts, 22 hours) also demonstrated comparable or lower workload reductions for individual cancer-related SRs.

Conclusions:
AI-based automation tools show promise in accuracy and efficiency when screening medical literature for cancer-related SRs. However, their performance varies. Until further advancements and comprehensive evaluations are undertaken, it is advisable to use AI tools as supplementary tools rather than replacements for human reviewers.