Artificial intelligence in evidence production: critical reflections and lessons learned

Article type
Authors
Akay H1, Macura B2, Nykvist B2
1KTH Royal Institute of Technology, Stockholm, Sweden
2Stockholm Environment Insitute, Stockholm, Sweden
Abstract
Introduction
Systematic reviews provide robust and comprehensive assessments of available evidence guiding decision-making from medicine to international development and environmental management. Nevertheless, the rapidly increasing volume and complexity of scientific literature and limitations to research funding pose notable challenges to the efficiency of evidence synthesis methods. This is especially problematic when short policy windows necessitate timely, comprehensive, and rigorous synthesis outputs.

The rapid development of artificial intelligence (AI), inclusive of machine learning and natural language processing, presents an opportunity to enhance various stages of the evidence synthesis process. Machine learning technologies have been tested and deployed for some time in the screening stages of systematic reviews, eg, for classifiers that help prioritize the most relevant titles and abstracts. But as the recent demonstration of Google’s large language model Gemini shows, AI-powered systems could be used to support the entire evidence synthesis production process from review question generation to searching and screening of evidence and to critical appraisal and data synthesis.

Methods
Leveraging insights from our two ongoing projects that use generative AI and large language models for eligibility screening and data extraction, respectively, we undertake a critical examination of AI's impact on evidence synthesis production.

Results
We discuss the efficiency, accuracy, and precision of AI tools we used in eligibility screening and data coding. We reflect on the concerns related to the presence of biases in training data stemming from data quality and representativeness. We emphasize the indispensability of human oversight in interpreting extracted information, ensuring the integrity and reliability of synthesized evidence.

Conclusions
AI can provide considerable support to human reviewers, making evidence production more streamlined and hence accessible to more evidence users. However, methodological and procedural transparency, robust validation of AI outputs, and careful examination of potential biases remain essential.