Encouraging Words as a Tool for Optimizing Large Language Models in Healthcare: A Study on Clinical Guidelines

2024 Prague [Global Evidence Summit]

Wang B¹, Luo X¹, Chen Y²

¹Evidence-based Medicine Center, School of Basic Medical Sciences, Lanzhou University, Lanzhou City, Gansu Province, China

²Research Unit of Evidence-Based Evaluation and Guidelines, Chinese Academy of Medical Sciences (2021RU017), School of Basic Medical Sciences, Lanzhou University, Lanzhou 730000, China, Lanzhou City, Gansu Province, China; Key Laboratory of Evidence Based Medicine of Gansu Province, Lanzhou 730000, China, Lanzhou City, Gansu Province, China; WHO Collaborating Centre for Guideline Implementation and Knowledge Translation, Lanzhou 730000, China, Lanzhou City, Gansu Province, China

"Background:
In interpersonal and human-computer interaction (HCI), natural language significantly enhances communication efficiency and expands interaction with computer systems. Notably, large language models (LLMs) such as OpenAI's GPT series and Google's Bard have demonstrated their powerful capabilities in sentiment analysis tasks, accurately identifying emotional tendencies—whether positive, negative, or neutral—through deep learning analysis of textual context.
Although the benefits of positive feedback, such as encouraging words, in promoting learning and performance improvement are widely recognized in educational and cognitive psychology, research on its impact on LLMs' performance, especially in evaluating the quality of clinical practice guidelines in healthcare, remains insufficient.

Objectives:
To explore whether the use of encouraging words as a form of positive feedback can improve the performance of ChatGPT-4 in evaluating the quality of clinical practice guidelines. It seeks to understand the potential impact of such feedback mechanisms on the optimization and performance of large language models, thereby enhancing the accuracy and efficiency of large language models in medical decision support systems.

Methods:
The research is based on an article published in the JAMA Network Open by Manuel M. Montero-Odasso et al., which evaluated 15 clinical practice guidelines using the Appraisal of Guidelines for Research and Evaluation (AGREE-II) Instrument. Building on this, we designed a pilot where the experimental group received prompts with encouraging words, while the control group received prompts with neutral words, to guide ChatGPT-4 in evaluating the guidelines according to the 23 items of the AGREE-II Instrument.
Using paired sample T-tests or Wilcoxon signed-rank tests, this study compared the differences in evaluation results for these 23 items between the experimental and control groups and the evaluation results of the guidelines in the original article, to quantify the effect of encouraging words.

Results:
Currently, the research is ongoing, and the results will be presented at the conference.

Conclusions:
Positive feedback through encouraging words may enhance ChatGPT-4's accuracy and efficiency in evaluating guideline quality, offering new insights into optimizing large language models for medical decision support systems."