Too much data from too many sources: what is the best estimate of the treatment effect?

Article type
Year
Authors
Li T1, Hong H2, Fusco N3, Mayo-Wilson E2, Dickersin K4
1Cochrane United States, Cochrane Eyes and Vision, Cochrane Comparing Multiple Interventions Methods Group, USA
2Johns Hopkins Bloomberg School of Public Health, USA
3Cochrane United States, USA
4Cochrane United States, Cochrane Eyes and Vision, USA
Abstract
Background: There is no question that data gleaned from clinical trials will become increasingly available. For meta-analysts, however, this presents a new challenge because data extracted from different sources about the same study do not always agree. For a systematic review on gabapentin for neuropathic pain, we identified 10 trials providing data for a pain outcome at eight weeks. These data were described in six journal articles, two conference abstracts, two FDA medical reviews, five individual patient data (IPD), and six clinical study reports.

Objectives: To describe a resampling-based, data-splitting approach to providing a distribution of all possible pooled estimates of effect and selecting data sources for meta-analysis.

Methods: The data structure is illustrated in the Table. In each resampling, we selected one set of outcome data from each study (n = 10) and performed a random-effects meta-analysis with the data selected (degrees of freedom = 9 in each meta-analysis). We ran 10,000 samples and generated a distribution of all possible pooled estimates of effect based on available data. We examined the contribution of each data source to the top and bottom 5 percentile of estimates. We also conducted sensitivity analyses by imposing probabilities of each data source being selected for the meta-analysis.

Results: When all data sources were used, the distribution of the meta-analytical estimates centered around -0.79 (95% confidence interval (CI) -1.28 to -0.26). When only one data source was used, the data from the FDA medical reviews appeared to provide a larger effect estimate than other data sources, but the 95% CIs overlap substantially. The contributions of each data source for the top and bottom 5 percentile of estimates do not seem to differ materially. Other results will be presented at the Colloquium.

Conclusions: Our approach offers a non-parametric solution to identifying a distribution of all possible pooled estimates of effect by using all data from all sources. By incorporating probabilities of selection, our approach also shows the impact of partial inclusion or complete exclusion of a data source.