Repeated measures meta-analysis of clinical trials

1998 Baltimore

Pham B, Chin W, Miller B, Rocchi A

Introduction/Objective: Reliable estimates of effectiveness measures for a comprehensive set of alternative interventions are an essential part of many economic evaluations. Clinical trials often reported repeated outcome measures. We discussed issues related to estimates of patient's progress measured by the American Urological Association (AUA) symptom scores in the context of an economic evaluation of a new agent for the treatment of patients with benign prostatic hyperplasia (BPH).

Methods: Our literature search yielded 5 placebo-controlled trials of terazosin (T); 11 finasteride (F); 2 doxazosin (D); one trial comparing terazosin and doxazosin; one three-arm trial with terazosin, finasteride and placebo (P); and a cohort of patients with finasteride. Although not discussed here, the new agent, surgery and watchful waiting were other options considered in the evaluation. In the meta-analysis (MA), repeated AUA symptom scores were extracted from published articles for each treatment group. The AUA score ranges from 0 to 35 (0=no symptoms and 35=most symptomatic). Quality of included trials was assessed using a validated scale (Jadad 96). Variance imputation was according to Follman 92. We used the Laird-Ware repeated measures approach to model the change in AUA symptom scores weighting each mean score by its inverse variance. Our model included single-, multiple-arm trials as well as two-arm comparative trials (Berkey 96). We considered each observation a treatment arm within a study apart from repeated assessments. A common correlation structure was assumed for each treatment arm. Treatment effect estimates were derived from the fitted model. These estimates were subjected to sensitivity analyses, including one with quality weighting (Moher 98).

Results: A typical assessment schedule was baseline, 2 weeks, 4, 8, 12,16,24, 36 and 52 weeks with sparse data at each time point The maximum study duration was 52 weeks for T, 104 weeks for F and 36 weeks for D. Only a constant correlation structure (i.e. compound symmetry) for each treatment arm was estimable. Symptoms improvement at 52 weeks was estimated at 3.1 (95% CI 2.6, 3.6) for P, 7.0 (5.8, 8.1) for T, 3.6 (2.8, 4.4) for F and for D at 36 weeks 5.6 (2.5, 8.7). Estimates of correlation over time within each treatment arm were rho=0.87 for F, rho=0.81 for T, rho=0.21 for P and rho=0.04 for D. These results did not change substantially in the sensitivity analysis with quality weight.

Discussion: In a similar analysis, Berkey 96 estimated the correlation structure from patient-level data in one study included in the MA. In this analysis, the correlation estimate for D was counter-intuitive. To obtain reliable treatment effect estimates, sensitivity analyses with various estimates of the correlation structures are guaranteed.