Artificial intelligence tools for screening references in systematic reviews. Preliminary Results of Scoping Review

Article type
Authors
1Universidad del Rosario, School of Medicine and Health Sicences, Public Health Research Group, Universidad del Rosario Biomedical Sciences PhD student, Universidad del Rosario, Colombia
2Universidad del Rosario, School of Medicine and Health Sicences, Colombia
3Universidad del Rosario, Public Health Research Group, EPIBIOS-UR Seedbed in Epidemiology and Biostatistics UR, Colombia
4Universidad Nacional de Colombia, Colombia
5Universidad del Rosario Méderi-Red Hospitalaria, Colombia
Abstract
Background
Artificial intelligence (AI) enables the automation of systematic literature reviews (SLR) as proposed by various international collaborations. High-quality SLR adhere to strict phases to address a structured question, search terms selection, literature search, preliminary references screening, study assessment, data extraction, and analysis. The pre-selection of references is one of the most labor-intensive, time-consuming, and error-prone phases. Although AI has the potential to simplify SLR, issues of compatibility, transparency, and trustworthiness hinder its adoption. Despite daily literature updates, summarizing and making implementation decisions remains a challenge.
Objective
To assess the available evidence supporting the use of artificial intelligence for systematic literature review reference pre-selection.
Methods
Scoping review following the Johanna Briggs methodology looking for the nature and extention of the evidence that enhances the use of AI-based algorithmic models for preliminary reference selection in SLR. Screening, selection and extraction were performed in duplicate. We attempt to cover the most commonly reported model, performance metrics, external validity and accesibility. It also includes characteristics of the AI technology and those that facilitate its adoption and implementation. Only articles published after 2019 were included. The results are presented as narrative description and tables and figures are used. A decision flowchart displayed the number of references and articles that were retrieved, eliminated, or included in the final analysis.
Results
After removing duplicates, we identified 9652 citations from the electronic database search. Based on the title and abstract, we excluded 9527, leaving 113 full-text articles to be assessed for eligibility. The leading countries of corresponding author affiliation, in descending order, were the USA, Canada, and the United Kingdom. Approximately 80% of the reported algorithm models were based on machine learning (ML), followed by natural language processing (NLP). Almost 40% of the ML models underwent external validation. No additional strategies were proposed to facilitate the adoption or implementation of algorithmic classifier models.
Conclusions
We summarized knowledge on algorithmic models for reference classification during the screening phase of systematic literature reviews (SLRs). Most recent literature focuses on ML, while external validation remains an unfrequently practice. No specific frameworks were identified to promote the adoption and implementation of these models.