What is the level of expertise of ChatGPT in the domain of systematic reviews and meta-analysis?

2023 London

Luo X¹, Lv M², Liu H², Zhu D², Wang L², Chen Y¹

¹Evidence-based Medicine Center, Lanzhou University

²School of Public Health, Lanzhou University

Background:
ChatGPT is an artificial intelligence chatbot developed by OpenAI and launched in November 2022. Although it has demonstrated a strong understanding and cognitive ability in numerous fields, its comprehension of topics related to systematic review and meta-analysis is currently not well-understood.

Objectives:
To determine ChatGPT’s level of understanding of knowledge related to systematic review and meta-analysis.

Methods:
We learned the methods for intervention systematic reviews and meta-analyses systematically from the Cochrane Handbook. Then, the core research team members discussed and identified 10 questions related to systematic reviews and meta-analyses and obtained relevant answers by asking ChatGPT. Three investigators independently assessed the accuracy of ChatGPT’s responses and rated them on a scale of 1-10, with a score of 10 indicating a perfect answer. Finally, the average accuracy of the three researchers for each question was calculated. The primary outcome measure was the total score of the 10 questions, which was calculated as (the mean score of question 1 + the mean score of question 2 + ... + the mean score of question 10) / 100 x 100%.

Results:
Ten questions and answers related to systematic review and meta-analysis were generated (Figure). The three researchers scored the 10 questions 91, 86, and 86, respectively. The average score was 87.7. The highest average score was 9.7 for Q4, indicating that ChatGPT provided the most accurate response to that question. The lowest average score was 7.3 for Q9, indicating that ChatGPT’s response to that question seemed to deviate from the truth to a greater extent.

Conclusions:
ChatGPT has a relatively high level of understanding of systematic reviews and meta-analysis knowledge, which is limited by the level of knowledge of the respondents, and a study with a large sample is needed to further validate the results.

Patient, public, and/or healthcare consumer involvement: None.