Article type
Year
Abstract
Background: Capture-recapture techniques were proposed for estimating the number of missing articles after a comprehensive literature search [Kastner et al. J Clin Epidemiol 2009]. Objectives: To investigate the practical and statistical feasibility of capture-recapture modelling of the number of missing references, using as an example a systematic review in gastroenterology. Methods: Articles were searched in six databases: Science Citation Index, MEDLINE, EMBASE, CENTRAL, BIOSIS and CINAHL. After identifying duplicates, we constructed a dataset marking for all articles and databases whether an article was or was not found in a database. Each possible combination of databases (‘cell’) served as an observation. The number of articles per cell was fitted using Poisson regression models, with and without variable selection. The estimated number of articles in the ‘missing’ cell provides an estimate for the number of missing articles. Results: The search resulted in 192 citations, referring to 130 articles. The Poisson model using only main effects provided an estimate of 56 missing articles (95% confidence interval 36 to 86). Using all two-way interactions, we got an estimate of 1439 [410 to 5045] articles, most probably unrealistic due to overfitting. A more plausible result (82 articles, 95% CI 52 to 128) was achieved using a model selection strategy that resulted in only one two-way interaction, based on the Bayesian information criterion (BIC). Conclusions: The usefulness of capture-recapture modelling in literature search is in question for two reasons. First, not all duplicates could be automatically identified because of different citation standards. This must be done in a time-consuming manual check. Second, statistical modelling is challenging, due to problems of variable selection and overfitting, resulting in partly unreliable estimates and wide confidence intervals.