Article type
Abstract
Background:
Data extraction is a critical yet labor-intensive and error-prone part of evidence synthesis. Errors in data extraction are common and can undermine the validity of evidence syntheses, affecting narrative summaries, meta-analyses, and conclusions. The introduction of large language models like ChatGPT, Gemini, or Claude has the potential to significantly improve data extraction efficiency and accuracy. In our proof-of-concept study, we tested Claude and found that it achieved an impressive accuracy rate of 96.3% when extracting data from open-access trials. Out of 160 data elements, Claude made only 6 errors.
Objectives:
The objective of this study is to develop use cases and assess a semi-automated data extraction process with Claude 2 against a traditional human-only method within the workflow of real-world systematic reviews.
Methods:
This study employs a study-within-reviews design to compare a semi-automated data extraction process using Claude (version 2) with a traditional human-only approach using a prospective, parallel-group design (Figure 1). We will select a convenience sample of 4 to 6 ongoing systematic reviews from the US Agency for Healthcare Research and Quality Evidence-based Practice Center program that include randomized and nonrandomized studies. Two independent data extraction teams will be formed for each review: one for human-only extraction and another for semi-automated extraction using Claude 2. The primary objective is to assess the concordance between these 2 processes and compare the time required for data extraction. The secondary objective is to evaluate the accuracy and types of errors in each process. Outcomes will include the proportion of concordant data elements and the time taken for data extraction tasks.
Results:
Results will be available at the time of the conference. We will compare approximately 3,500 data elements and hypothesize that the concordance between human-only and semi-automated data extraction will be over 80%.
Discussion:
This study will provide real-world evidence on using large language models to semi-automate the data extraction process in evidence syntheses. If successful, it could increase efficiency and reduce errors, thereby enhancing the quality and reliability of health care evidence available to patients.
Data extraction is a critical yet labor-intensive and error-prone part of evidence synthesis. Errors in data extraction are common and can undermine the validity of evidence syntheses, affecting narrative summaries, meta-analyses, and conclusions. The introduction of large language models like ChatGPT, Gemini, or Claude has the potential to significantly improve data extraction efficiency and accuracy. In our proof-of-concept study, we tested Claude and found that it achieved an impressive accuracy rate of 96.3% when extracting data from open-access trials. Out of 160 data elements, Claude made only 6 errors.
Objectives:
The objective of this study is to develop use cases and assess a semi-automated data extraction process with Claude 2 against a traditional human-only method within the workflow of real-world systematic reviews.
Methods:
This study employs a study-within-reviews design to compare a semi-automated data extraction process using Claude (version 2) with a traditional human-only approach using a prospective, parallel-group design (Figure 1). We will select a convenience sample of 4 to 6 ongoing systematic reviews from the US Agency for Healthcare Research and Quality Evidence-based Practice Center program that include randomized and nonrandomized studies. Two independent data extraction teams will be formed for each review: one for human-only extraction and another for semi-automated extraction using Claude 2. The primary objective is to assess the concordance between these 2 processes and compare the time required for data extraction. The secondary objective is to evaluate the accuracy and types of errors in each process. Outcomes will include the proportion of concordant data elements and the time taken for data extraction tasks.
Results:
Results will be available at the time of the conference. We will compare approximately 3,500 data elements and hypothesize that the concordance between human-only and semi-automated data extraction will be over 80%.
Discussion:
This study will provide real-world evidence on using large language models to semi-automate the data extraction process in evidence syntheses. If successful, it could increase efficiency and reduce errors, thereby enhancing the quality and reliability of health care evidence available to patients.