A methodological review and comparison of methods for estimating the sample mean and standard deviation from other available data

Article type
Authors
Bajpai R1
1School of Medicine, Keele University, Newcastle-under-Lyme, United Kingdom
Abstract
Background
Mean and standard deviation (SD) are commonly pooled from different studies to generate a combined effect estimate in meta-analysis of continuous outcomes. Sometimes, studies may report median and related measures of dispersion instead of sample mean and SD for a given outcome where the outcome follows a skewed distribution. Therefore, in order to combine individual study results in a consistent format, it is necessary to estimate the sample mean and SD for such studies.
Objective
To review and compare the available methods for estimating the sample mean and SD from the sample size, median, range and/or interquartile range for different scenarios.
Methods
In this methodological review, we investigate existing methods for estimating sample mean and SD using other available summary statistics. A number of medical statistics methodology journals such as BMC Medical Research Methodology, Statistical Methods in Medical Research, Research Synthesis Methods, and Statistics in Medicine were searched to identify proposed methods for estimating sample mean and SD. We also searched Google Scholar for grey literature to locate potentially relevant articles. In a given study, summary data is most commonly reported as either; S1 (a, m, b; n), S2 (a, q1, m, q3, b; n), or S3 (q1, m, q3; n) where a = minimum, m = median, b = maximum, q1 = first quartile, q3 = third quartile, and n = sample size. An in-depth review has been conducted to explore available methods in the literature, pros and cons of each method, easy implementation of these methods. We also compared the performance of these methods in different scenarios using a sample data from various distributions such as lognormal, exponential, Weibull, and beta distribution for different sample sizes.
Conclusion
Initial searches indicate that several methods are available in the literature, and results from sample data following different distribution showed that no single method fits to all scenarios of estimating mean and SD. Therefore, it is essential to know and inform meta-analysts about which proposed method works better in commonly reported scenarios and under a particular distributional assumption to avoid any poor estimation of the sample mean and SD.