Reviewing the reviews: the evidence base of contemporary medicine is weak

1998 Baltimore

Ezzo J, Moerman D, Hadhazy V, Bennan B

Introduction: Many assume that contemporary medical practice stands on a bedrock of several large, well-designed, well-executed studies.

Objective: Our objectives are 1) to examine this assumption by assessing the number and quality of the primary studies included in Cochrane systematic reviews; 2) To determine whether two readers draw similar conclusions from the same review, and whether their conclusions are those intended by the authors of the review; and 3) To summarize conclusions.

Methods: All completed reviews in the Cochrane Library Issue 1, 1998 (n=326) were assesssed for the number of included trials. A subset of 159 reviews (alternately selected) was further examined by two readers per review to extrapolate information on trial quality, control group and conclusions. Readers classified the conclusions into one of six categories (listed below). Interrater agreements were calculated and differences resolved by discussion. Reviews' authors were asked to categorize conclusions using the same classifications (to be presented in poster).

Results: The number of primary studies included per review was small (median=5; mode=3; range 0-47) and the quality poor or methodologically limited in 30 (19%) reviews. Consensual conclusions are depicted below.

Conclusion # of reviews (%)
No evidence of effect (insufficient evidence to draw conclusions) 39 (24.5)
Evidence of positive effect (treatment more beneficial than control) 33 (20.8)
Possibly positive effect (unresolved issues such as poor trial quality, small number of patients, or long term side effects preclude making a definitive statement) 28 (17.6)
Evidence of no effect (treatment no more beneficial than a placebo or no treatment) 33 (20.8)
Treatments appear equal (treatment no more beneficial than other/standard medical care) 15 (9.4)
Evidence of negative effect (treatment more harmful than beneficial) 11 (6.9)

Interrater agreements on the reviews' conclusions were 0.68 and 0.72, respectively for readers 1 and 3, and readers 1 and 2, indicating only moderate agreement in how the results of a review should be interpreted.

Discussion: Both the number and quality of the primary studies on which much contemporary medical practice stands are disappointingly weak. The number of reviews indicating that modern biomedical procedures show no effect compared to control is surprisingly high. The moderate scores on interrater agreements suggest that in several instances the conclusions of the review may be ambiguous, leading two readers to interpret the same review differently.