Automated assessing completeness and accuracy of reporting randomized clinical trials: based on large language models

Article type
Authors
Ji C1, Wang Y2, Zhang C3, Zhang X4, Zhang X
1School of Computer Science, Fudan University, Shanghai, China
2Fudan GRADE Center, Shanghai, China
3Respiratory Department, Children's Hospital of Fudan University, Shanghai, China
4Nursing Department, Children's Hospital of Fudan University, Shanghai, China; Fudan GRADE Center, Shanghai, China
Abstract
Background: High-quality randomized controlled trials (RCTs) are regarded as the best method for assessing the effectiveness of health care interventions. Inadequate reporting of RCTs can lead to misinterpretations, reduced credibility, difficulties in replicating the study, and wasted resources. Following the Consolidated Standards of Reporting Trials (CONSORT) for randomized clinical trials is linked to enhanced quality.
Objectives: To develop and evaluate the performance of an RCT Publications Transparency Assessment System, named CONSORT-TAS.
Methods: We implemented a novel data augmentation technique leveraging ChatGPT to enhance the accuracy of text classification models designed for evaluating the transparency of RCT publications. Additionally, we devised a heuristic algorithm to associate various CONSORT checklist items with corresponding tables and conducted an in-depth analysis of table contents to facilitate item judgment. Furthermore, we meticulously double-annotated and adjudicated a corpus comprising 52 RCT articles at the sentence level, using 37 fine-grained CONSORT checklist items across training, testing, and validation sets to develop and refine the CONSORT-TAS. Accuracy was evaluated as the ratio of correct assessments to total assessments.
Results: All 37 CONSORT checklist items were included in the system. The average runtime of the system was approximately 40 seconds to download and analyze the target article. Thirty-eight RCT articles from the CONSORT-LLM corpus were selected to assess the performance of CONSORT-TAS. Our results showed that the average runtime of the system was approximately 40 seconds to download and analyze the target article. Of the implemented items, 36 (94.7%) achieved an accuracy exceeding 85%, and among the implemented articles, 33 (86.8%) demonstrated an accuracy surpassing 80%. Two case studies for RCT articles are provided as an illustration for the CONSORT-TAS.
Conclusions: We developed CONSORT-TAS, a desktop-compatible standalone software, leveraging advanced language models. It can autonomously generate CONSORT checklist reports, simplify the completion of CONSORT checklists, and encourage improved reporting practices. This system may prove beneficial for authors, manuscript reviewers, and journal editors.