Article type
Year
Abstract
Background: Clinicians and policymakers prioritize understanding intervention effects under highly controlled (efficacy) and ‘real world’ (effectiveness) conditions.
Objectives: To determine the inter-rater reliability of a validated tool for evaluating trial efficacy-effectiveness (EE) and evaluate associations with treatment effects.
Methods: As part of a systematic review evaluating noninvasive positive pressure ventilation (NPPV) for adults with acute respiratory failure, investigator pairs independently rated EE of 69 randomized trials. We adapted a previously validated, seven-item instrument (Table 1) addressing setting, eligibility criteria stringency, clinically important health outcomes, intervention flexibility and followup duration, assessment of adverse effects, adequate sample size, and intent-to-treat analysis approach. Each item was scored 0 or 1 with total scores ranging from 0 to 7. Studies were categorized as efficacy (0–2), mixed (3–5) or effectiveness (6–7). We measured reliability with simple agreement and Kappa statistics. We used subgroup analyses, using consensus ratings, to determine if treatment effects varied by EE rating.
Results: Three experienced methodologists trained with three trial sets to develop operational definitions for each EE domain. Of the 69 studies, 17 were classified as efficacy, 50 mixed, and 2 effectiveness. Simple agreement was 79% and unweighted kappa 0.60. Pooled odds ratios (ORs) for NPPV effects on mortality varied by EE category: efficacy = 0.56 (95% CI, 0.31–1.02), mixed = 0.52 (95% CI, 0.41–0.66), and effectiveness = 0.99 (95% CI, 0.66–1.49; p = 0.02 for between group differences). Analysis of risk for intubation by EE category yielded similar results: ORs efficacy = 0.29 (95% CI, 0.19–0.46), mixed = 0.29 (95% CI, 0.21–0.41) and effectiveness = 0.58 (95% CI, 0.16–2.13; p = 0.61 for between group differences).
Conclusions: EE are feasible and ratings can be made reliably but require calibration practice. In one test set withmostly mixed EE studies, some treatment effects varied by EE ratings. We are using this approach with an additional dataset and will present updated results.
Objectives: To determine the inter-rater reliability of a validated tool for evaluating trial efficacy-effectiveness (EE) and evaluate associations with treatment effects.
Methods: As part of a systematic review evaluating noninvasive positive pressure ventilation (NPPV) for adults with acute respiratory failure, investigator pairs independently rated EE of 69 randomized trials. We adapted a previously validated, seven-item instrument (Table 1) addressing setting, eligibility criteria stringency, clinically important health outcomes, intervention flexibility and followup duration, assessment of adverse effects, adequate sample size, and intent-to-treat analysis approach. Each item was scored 0 or 1 with total scores ranging from 0 to 7. Studies were categorized as efficacy (0–2), mixed (3–5) or effectiveness (6–7). We measured reliability with simple agreement and Kappa statistics. We used subgroup analyses, using consensus ratings, to determine if treatment effects varied by EE rating.
Results: Three experienced methodologists trained with three trial sets to develop operational definitions for each EE domain. Of the 69 studies, 17 were classified as efficacy, 50 mixed, and 2 effectiveness. Simple agreement was 79% and unweighted kappa 0.60. Pooled odds ratios (ORs) for NPPV effects on mortality varied by EE category: efficacy = 0.56 (95% CI, 0.31–1.02), mixed = 0.52 (95% CI, 0.41–0.66), and effectiveness = 0.99 (95% CI, 0.66–1.49; p = 0.02 for between group differences). Analysis of risk for intubation by EE category yielded similar results: ORs efficacy = 0.29 (95% CI, 0.19–0.46), mixed = 0.29 (95% CI, 0.21–0.41) and effectiveness = 0.58 (95% CI, 0.16–2.13; p = 0.61 for between group differences).
Conclusions: EE are feasible and ratings can be made reliably but require calibration practice. In one test set withmostly mixed EE studies, some treatment effects varied by EE ratings. We are using this approach with an additional dataset and will present updated results.
Images