The WISEST AI Project: an artificial intelligence decision support tool to assess the quality/bias in systematic reviews

2024 Prague [Global Evidence Summit]

Bagheri E¹, Kanjii S², Lunny C³, Nazari T⁴, Pieper D⁵, Rad R¹, Ridley B⁶, Shea B², Sun K⁷, Tricco A⁷, WISEST AI Project Team ⁸

¹Laboratory for Systems, Software and Semantics (LS3), Toronto Metropolitan University, Toronto, ON, Canada

²The Ottawa Hospital and Ottawa Health Research Institute, Ottawa, ON, Cannada

³University of British Columbia, Vancouver, BC, Canada

⁴Department of Medical Geriatrics, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran

⁵Institute for Health Services and Health System Research, Faculty of Health Sciences Brandenburg, Brandenburg Medical School, Brandenburg, Brandenburg, Germany

⁶Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, Toronto, ON, Canada

⁷Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, Toronto, ON, Canada; Epidemiology Division and Institute of Health Policy, Management, and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada

⁸University of British Columbia, Vancouver, BC, Canada; Li Ka Shing Knowledge Institute, St. Michael’s Hospital, Unity Health Toronto, Toronto, ON, Canada; Epidemiology Division and Institute of Health Policy, Management, and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada

Background: Evidence-based medicine (EBM) stipulates that all relevant and rigorous evidence should be used to make clinical, public health, and policy decisions. Systematic reviews (SRs) were developed as summaries of all available evidence to enable EBM. Critically appraising SRs is often done using tools like the AMSTAR 2 checklist for methodological quality and the ROBIS tool for bias risk assessment. Currently, no automated tool exists for this purpose. This project aims to build an AI tool to assess the quality and biases in SRs by: (a) building a labelled dataset of 1000 SRs that are quality/bias assessed to train the AI model by crowdsourcing to recruit volunteer collaborators; (b) developing the code for the model and testing its performance; (c) building a user interface (website) to house the tool.
Methods: We posted a request for collaborators on Cochrane Engage, a crowdsourcing website, on May 24, 2023, asking for volunteer collaborators who have experience with SRs, critical thinking, and problem-solving skills. Respondents were sent training materials and study instructions as a first step to do remote self-training. As collaborators began to extract and assess SRs, feedback was given on each item that was not assessed correctly until their quality reached 100%. For the AI modelling, we first use Dense Passage Retrieval (DPR) to identify the relevant passages from the PDFs based on the cosine similarity scores between the passages and the question (ROBIS/AMSTAR 2 item). A transformer model will be trained (S-BERT) to rank passages, which we will then fine-tune using our dataset.
Results: To date, we have recruited 56 crowdsourced assessors, and 25 collaborators are actively working on assessments. In total, 486/1000 SRs have been completed: 366 assessments are completed and checked, 98 are pending checking, and 22 assessments are in progress. Two data scientists conducted the data-cleaning coding and DPR and wrote the model code. A website to house the tool was built.
Conclusion: In conclusion, crowdsourcing is an effective strategy to build a large, complex, and balanced dataset. We present the first open-access and free AI tool to critically assess SRs.