Imputing summary statistics for meta-analysis of continuous data

Article type
Authors
Tomlinson G, Beyene J
Abstract
Background: In a meta-analysis of studies comparing continuous outcomes in two groups, the study-specific group means and standard deviations (SDs) are needed for the test for heterogeneity and to compute a pooled effect estimate. We will refer to reporting of means and standard deviations as standard reporting . In a particular meta-analysis, some studies may employ non-standard reporting due to the nature of the data in those studies. When individual outcomes have a skewed distribution, common statistical teaching suggests reporting the median instead of the mean. Reporting of the median usually coincides with reporting of the range or interquartile range (IQR) as a measure of spread. This creates a situation where the measures reported in a study (mean and SD vs. median and range or IQR) may depend on the data observed in the study. Reviewers may sometimes consider imputing the mean and SD from the reported summary statistics. The effect of such imputations on the meta-analytic results has not been well studied

Objective: To assess the sensitivity of results of different meta-analytic strategies for dealing with studies of continuous outcomes that report medians instead of means and ranges or interquartile ranges instead of standard deviations.

Methods: Three strategies available to the analyst in this situation are: (a) use only the studies with standard reporting; (b) treat the medians as means and estimate the unknown SDs from the study-specific IQR or range; (c) use the medians as the means and use a pooled estimate of the SD from the studies that report it. With an extensive simulation study, we evaluate the performance of these three strategies under various combinations of (i) the number of studies in the meta-analysis; (ii) the proportion of studies with non-standard reporting; (iii) the within-study-sample size; (iv) the degree of heterogeneity of the true treatment effect; (v) the degree of skewness in the individual studies. We also compare the results of using the three strategies in a real meta-analysis.

Results: Preliminary results suggest that strategies involving estimation of the SD are preferred over the strategy that uses only complete data. Estimation of the SD introduces little bias, but increases precision of the pooled estimate. This effect is more pronounced as the proportion of studies with non-standard reporting increases. Further simulation analyses are underway with the aim of characterizing situations where imputation of missing information would not be recommended