Using Large Language Models in Assessing Bias Risk in Cohort Studies

Article type
Authors
Xia D1, Lai H1, Ge L1
1Department of Health Policy and Management, School of Public Health, Lanzhou University, Lanzhou, China; Evidence-Based Social Science Research Center, School of Public Health, Lanzhou University, Lanzhou, China
Abstract
Objective
To explore the feasibility and accuracy of utilizing Large language models (LLMs) to assess risk of bias (ROB) in cohort studies.
Methods
We conducted a pilot and feasibility study in 30 cohort studies selected from reference lists of published Cochrane reviews. We developed a structured prompt to guide the KimiChat, a LLM developed by Moonshot AI, in assessing the ROB of each cohort study twice. Three evidence-based medicine experts independently reviewed the studies and established a gold standard for reference, then we assessed the accuracy of KimiChat's assessments by calculating correct rate, sensitivity, and specificity for overall and item-specific level. The consistency of the overall and item-specific results from two assessments was evaluated using Cohen's kappa (κ), and prevalence- and bias-adjusted kappa. Efficiency was estimated by the mean assessment time required.
Results
The KimiChat demonstrated an overall high accuracy across the 30 studies’ ROB assessments, with a mean correct rate of 93.9% (95% CI: 91.4%-95.9%). In most items (81.3%), the mean correct rate ranged 95-100%. The accuracy rate for assessing prognosis and the accuracy of outcome assessment was 100% in both assessments for each cohort study. The lowest assessment accuracy was for the item of similarity in common interventions, with the results of the two assessments at 73.3% (95% CI: 54.1%-87.7%) and 66.67% (95% CI: 47.2%-82.7%) The mean consistency rates between the two assessments was 97.1%. The KimiChat achieved a perfect kappa (κ=1) in six items and a κ exceeded 0.80 in eight items. The mean time to assess a study was 67.3 seconds.
Conclusions
KimiChat's efficient and accurate assessment of ROB in cohort studies indicates the supportive role and application potential of LLMs in the process of system reviews.
Funding
This research received no specific grant from any funding agency.