Scoring trials on the efficacy-effectiveness continuum: A systematic analysis

Article type
Authors
Witt C1, Manheimer E2, L"udtke R3, Hammerschlag R4, Lao L5, Berman B2
1Institute for Social Medicine and Epidemiology, Charité University Medical Center Berlin
2Cochrane Collaboration Complementary Medicine Field, University of Maryland School of Medicine
3Carstens Foundation, Essen
4Oregon College of Oriental Medicine
5University of Maryland School of Medicine
Abstract
Background: 'Efficacyá refers to the extent to which a specific intervention is beneficial under ideal conditions. Efficacy produces results for an intervention under carefully controlled conditions chosen to maximize the likelihood of observing an effect if it exists. In contrast, 'effectivenessá is a measure of the extent to which an intervention, when deployed in the field in routine circumstances, does what it is intended to do for a specific population. To this end, effectiveness trials use eligibility criteria, treatment protocols, and outcomes that are close to usual care. For valid decision making in usual care, there is an urgent need for more evidence from Comparative Effectiveness Research (CER). PRECIS (Thorpe et al. CMAJ. 2009;180(10):E47-E57.) was mainly developed to guide the design of RCTs in 10 dimensions along the efficacy-effectiveness continuum. It is of major interest whether these dimensions can be applied to existing trials as a means of strengthening the evidence base for CER.

Objectives: To assess the efficacy-effectiveness continuum of randomized studies using acupuncture for low back pain as an example.

Methods: All English language RCTs that compared acupuncture with a conventional treatment control and had >30 acupuncture patients were analyzed by 5 raters using a PRECIS-derived-scale, before and after a consensus process.

Results: 10 studies were evaluated with PRECIS (119 abstracts, 44 publications screened). The first rating showed large variance between raters and items (intraclass correlation 0.02-0.60). This was mainly due to missing information in the publications and to difficulties in operationalizing the scoring items. After the consensus discussions, the intraclass correlation improved to 0.20-1.00.

Conclusions: To appraise the value of RCTs for CER, clearer operational criteria are needed, raters have to be trained in applying the criteria, and more detailed information is needed when reporting RCTs.