Article type
Year
Abstract
Background: Review Manager is currently the software used for preparing Cochrane systematic reviews. Review Manager 5.3 facilitates the performance of meta-analyses in systematic reviews of interventions by providing methods to fit fixed-effect and random-effect models. However, similar functionality is not available to fit Reitsma or bivariate models for systematic reviews of diagnostic test accuracy, leaving systematic reviewers to fit these models themselves, and copy the results back into Review Manager. This may lead to human errors.
Objective: to assess the feasibility of automated verification of systematic reviews of diagnostic accuracy, and to check automatically for human errors in the data reported in systematic reviews.
Method: we used data in the data extraction forms, diagnostic test result tables, and 'Summary of findings' tables from 63 systematic reviews.
We attempted to replicate the calculation of the summary scores reported in each systematic review, based on the data reported in the data sections. We also performed sanity checking on each stage in the reported results, i.e. simple tests to check that the data were numerically reasonable and legal.
Results: we were able to replicate 439/589 summary scores using only the information reported in the data sections of the systematic review. The main reasons for failure were:
1) because there were data discrepancies between the 2 x 2 tables and the summary scores;
2) because we were unable to match 2 x 2 tables to summary scores; and
3) because the relevant data were presented in the text, rather than in the data tables.
We found two errors in 63 reviews, one error where the review reported summary scores for the wrong test, and one error where the review permuted the mean and the bounds of the confidence interval (reporting '74.7 [85.2, 82.3]' instead of '82.3 [74.7, 85.2]', i.e. not a legal confidence interval).
Conclusions: most analyses were straightforward to replicate automatically, but 150/589 would either have required contacting the systematic review authors for clarification, or reading the text, and therefore fell outside the scope of this study. However, these would likely be manually replicable. Automated verification and replication of systematic reviews is not likely to be possible while the meta-analyses in the systematic reviews are performed manually outside Review Manager.
Genuine mistakes identifiable by sanity checking were rare. However, the two errors we did find could be identified using simple checks, e.g. checking for duplicate rows in the results or checking that means and confidence intervals take numerically legal values.
Patient or healthcare consumer involvement: error checking in systematic reviews may lead to better evidence based practice, which would benefit patients. However, we have seen no way to meaningfully involve patients in the design of this study.
Objective: to assess the feasibility of automated verification of systematic reviews of diagnostic accuracy, and to check automatically for human errors in the data reported in systematic reviews.
Method: we used data in the data extraction forms, diagnostic test result tables, and 'Summary of findings' tables from 63 systematic reviews.
We attempted to replicate the calculation of the summary scores reported in each systematic review, based on the data reported in the data sections. We also performed sanity checking on each stage in the reported results, i.e. simple tests to check that the data were numerically reasonable and legal.
Results: we were able to replicate 439/589 summary scores using only the information reported in the data sections of the systematic review. The main reasons for failure were:
1) because there were data discrepancies between the 2 x 2 tables and the summary scores;
2) because we were unable to match 2 x 2 tables to summary scores; and
3) because the relevant data were presented in the text, rather than in the data tables.
We found two errors in 63 reviews, one error where the review reported summary scores for the wrong test, and one error where the review permuted the mean and the bounds of the confidence interval (reporting '74.7 [85.2, 82.3]' instead of '82.3 [74.7, 85.2]', i.e. not a legal confidence interval).
Conclusions: most analyses were straightforward to replicate automatically, but 150/589 would either have required contacting the systematic review authors for clarification, or reading the text, and therefore fell outside the scope of this study. However, these would likely be manually replicable. Automated verification and replication of systematic reviews is not likely to be possible while the meta-analyses in the systematic reviews are performed manually outside Review Manager.
Genuine mistakes identifiable by sanity checking were rare. However, the two errors we did find could be identified using simple checks, e.g. checking for duplicate rows in the results or checking that means and confidence intervals take numerically legal values.
Patient or healthcare consumer involvement: error checking in systematic reviews may lead to better evidence based practice, which would benefit patients. However, we have seen no way to meaningfully involve patients in the design of this study.
PDF