What do we know about interpretation and application of test accuracy measures?

Article type
Authors
Davenport C1, Hyde C2
1Birmingham University, UK
2Exeter University, UK
Abstract
Background: The widespread belief that decision makers have difficulty understanding and applying test accuracy information has not been based on a systematic interrogation of the evidence base to allow quantification or characterisation of the extent of the problem.

Aims and objectives: To comprehensively ascertain literature pertinent to the understanding and application of test accuracy measures in order to identify facilitators and barriers to their use by decision makers.

Methods: Bibliographic searches were conducted in 2003, 2005, 2007 and 2010, across 11 databases representing medicine, psychology and education. Searches were iterative, purposive and supplemented by reference checking included studies and contact with experts. A narrative synthesis of empirical and theoretical test accuracy and risk communication literature was undertaken.

Results: 64 test accuracy and 21 risk communication papers were included. Research is characterised by self selected samples, lacks external validity and primary care is under-represented. Ability to define the most commonly used metrics (sensitivity, specificity, predictive values) is poor. Predictive values and test errors are promoted as most intuitive although there is no empirical evidence supporting the superiority of a single test accuracy metric for diagnostic decision making. Natural frequency and multiple presentation formats facilitate understanding. Verbal descriptions and negative test results may be less well understood. Self-reported use of measures varies: 80% for predictive values, 4% for sensitivity and specificity and 1% for ROC curves and likelihood ratios. Pre-test probability and test accuracy estimation is inaccurate and highly variable which has implications for probability revision.

Conclusions: The emphasis in the literature has been on identifying the best single metric rather than identifying an optimal combination and understanding of meta-analytic summary measures has not been investigated. Investigation of contextual and motivational influences on test and test-treat thresholds is required to identify test accuracy magnitudes that will have most impact on diagnostic and therapeutic yield.