Article type
Year
Abstract
Background: Systematic reviews (SRs) require significant resources and time to complete. A typical SR takes between six and 18 months. Interventions that can make SRs swifter to complete, with fewer resources, are needed.
Objectives: To evaluate the performance of a new natural language processing (NLP) algorithm.
Methods: We developed a new NLP algorithm based on diverse relevance ranking models for MEDLINE citations. A linear combination of two ranking scores from semantic relevance ranking and latent Dirichlet allocations was used to predict an overall relevance ranking score for each citation. To evaluate the performance of this new method, we selected a convenience sample of five SRs published by Cochrane. We estimated area under curve (AUC), sensitivity, false-positive rate, total screening burden, and percentage of reduction in screening. We compared the new pooled effect size to the published one using the Altman and Bland method.
Results: The new NLP algorithm achieved an average AUC of 0.82 (range: 0.49 to 0.95). With 70% reduction of the number of citations to be screened, we observed over 80% sensitivity in four out of five SRs. We did not find significant difference between the published effect size and the new pooled effect size even after 90% reduction of citations.
Conclusions: NLP algorithms showed promising results on accelerating the SR process and reducing workloads. Future work is needed to expand the search beyond MEDLINE and validate this pilot study.
Objectives: To evaluate the performance of a new natural language processing (NLP) algorithm.
Methods: We developed a new NLP algorithm based on diverse relevance ranking models for MEDLINE citations. A linear combination of two ranking scores from semantic relevance ranking and latent Dirichlet allocations was used to predict an overall relevance ranking score for each citation. To evaluate the performance of this new method, we selected a convenience sample of five SRs published by Cochrane. We estimated area under curve (AUC), sensitivity, false-positive rate, total screening burden, and percentage of reduction in screening. We compared the new pooled effect size to the published one using the Altman and Bland method.
Results: The new NLP algorithm achieved an average AUC of 0.82 (range: 0.49 to 0.95). With 70% reduction of the number of citations to be screened, we observed over 80% sensitivity in four out of five SRs. We did not find significant difference between the published effect size and the new pooled effect size even after 90% reduction of citations.
Conclusions: NLP algorithms showed promising results on accelerating the SR process and reducing workloads. Future work is needed to expand the search beyond MEDLINE and validate this pilot study.