Article type
Abstract
Background
Systematic reviews and meta-analyses are cornerstones of evidence-based medicine, informing clinical guidelines and regulatory decisions. However, the screening process for identifying eligible studies is labor-intensive and time-consuming. Recent advancements in Large language models (LLMs) offer potential solutions to enhance the efficiency and accuracy of this critical step.
Objective
To assess the viability and precision of LLMs in systematic review title and abstract screening.
Method
We conducted a pilot and feasibility study selecting 10 systematic reviews from the Cochrane Library. A structured prompt was developed to guide KimiChat (a Master of Laws developed by Moonshot AI) in screening the titles and abstracts of each systematic review. Three experts in evidence-based medicine independently reviewed these studies and established a gold standard for reference. We then assessed the accuracy of KimiChat's evaluations by calculating sensitivity and specificity at both the overall and item-specific levels. Efficiency was estimated by the mean assessment time.
Results
THE KimiChat demonstrated a sensitivity of 100% (95% CI: 93.6% to 100%) in title and abstract screening, accurately identifying all studies eligible for full-text review. The specificity was 92% (95% CI: 90% to 94%), with a 3% increase in workload due to the inclusion of some ineligible studies. Compared to traditional methods, the LLM reduced the screening time by 80%, significantly enhancing the efficiency of the systematic review process.
Conclusion
The structured prompts we developed may facilitate KimiChat in conducting efficient and accurate title and abstract screening, which could greatly assist human reviewers in conducting systematic reviews.
Systematic reviews and meta-analyses are cornerstones of evidence-based medicine, informing clinical guidelines and regulatory decisions. However, the screening process for identifying eligible studies is labor-intensive and time-consuming. Recent advancements in Large language models (LLMs) offer potential solutions to enhance the efficiency and accuracy of this critical step.
Objective
To assess the viability and precision of LLMs in systematic review title and abstract screening.
Method
We conducted a pilot and feasibility study selecting 10 systematic reviews from the Cochrane Library. A structured prompt was developed to guide KimiChat (a Master of Laws developed by Moonshot AI) in screening the titles and abstracts of each systematic review. Three experts in evidence-based medicine independently reviewed these studies and established a gold standard for reference. We then assessed the accuracy of KimiChat's evaluations by calculating sensitivity and specificity at both the overall and item-specific levels. Efficiency was estimated by the mean assessment time.
Results
THE KimiChat demonstrated a sensitivity of 100% (95% CI: 93.6% to 100%) in title and abstract screening, accurately identifying all studies eligible for full-text review. The specificity was 92% (95% CI: 90% to 94%), with a 3% increase in workload due to the inclusion of some ineligible studies. Compared to traditional methods, the LLM reduced the screening time by 80%, significantly enhancing the efficiency of the systematic review process.
Conclusion
The structured prompts we developed may facilitate KimiChat in conducting efficient and accurate title and abstract screening, which could greatly assist human reviewers in conducting systematic reviews.