Meta-analysis as a simultaneous inference problem: a novel approach to assess replicability of evidence

2020 Abstracts

Panagiotou O¹, Voorhies K¹, Jaljuli I², Schmid C¹, Heller R²

¹Brown University

²Tel-Aviv University

Background: Replicability of treatment effects protects patients, clinicians, and policy makers from claiming conclusive evidence solely based on a single study which may be a false-positive due to chance or bias.

Objectives: To assess the extent of replicability in Cochrane meta-analyses and characteristise non-replicable bodies of evidence.

Methods: We included all meta-analyses of binary outcomes with n>4 studies. We applied the partial conjunction hypothesis test to quantify the evidence for replicability. The method establishes that the treatment effect is replicated in at least u out of n studies by testing the u/n-replicability null hypothesis, ie at least n-(u-1) of the component hypotheses in a meta-analysis simultaneously hold true. It calculates a summary measure (r-value) which is the p-value of the aforementioned null replicability hypothesis. Replicability is established if the r-value is less than the type I error α=0.05. Using the same meta-analytical methods as the Cochrane reviews, we computed the r-value for u=2 and u=3 to determine whether the treatment effect is replicated in at least 2 and at least 3 studies. For each meta-analysis, we computed the u-max, ie the maximum u for which the u/n-replicability null hypothesis is rejected; u-max is the 1-α lower confidence bound on the number of studies with effect in the same direction.

Results: A total of 23,561 meta-analyses with 258,948 individual trials were eligible. The median number of studies per meta-analysis was 8 (interquartile range, IQR=6-12) and the median sample size was 2,984 (IQR=1,231-7,722). Replicability for u=2 was not met (r>0.05) in 15,482 (66%) meta-analyses and for u=3 in 17,738 (75%) meta-analyses. There were 9,863 statistically significant meta-analyses. Among those, replicability for u=2 was not met in 2,970 (30%), i.e. 1 study driving the meta-analysis significance; for u=3, replicability was not met in 4,493 (46%) with 2 studies driving the significance. The median u-max was 3 (IQR=1-5) and the median ratio of u-max to the total number of studies was 33% (IQR= 14%-60%). In total, 5,078 (22%) meta-analyses had evidence of small study effects and the treatment effect was replicated in in at least two studies in 2,684 (53%) of those meta-analyses. Among statistically significant meta-analyses whose treatment effect was replicated in at least two studies (n=6,893), the treatment effect between the replicated studies and the overall meta-analysis was greater than 10% for 3,518 (51%) meta-analyses; differences in treatment effects between the replicated studies and the overall meta-analysis were statistically significant in 34 cases. Results were similar when using α=0.005 and α=0.001.

Conclusions: Treatment effects are replicated in at least 2 trials in two-thirds of statistically significant meta-analyses with small variations in effect estimates. For many meta-analyses, statistical significance is sensitive to a small number of studies relatively to the number of synthesized studies.

Patient or healthcare consumer involvement: None