You can learn a lot from a DUMMY: Improving search strategies using the ‘diagnostic-based utility for mining MEDLINE’ (DUMMY)

Article type
Authors
Abou-Setta A1, Klassen T1, Kirby S1, Zarychanski R1
1Center for Healthcare Innovation, University of Manitoba, Canada
Abstract
Background: MEDLINE contains approximately 20 million Medical Subject Heading (MeSH)-indexed citations while PubMed has an additional 2 million un-indexed citations. PubMed/MEDLINE search strategies are available but have generally been validated against small citations sets. With the ability to text-mine and analyze large data sets, more comprehensive search strategies can be developed.

Objectives: To create optimized PubMed/MEDLINE searching strategies without the use of MeSH terms.

Methods: Using the ‘Diagnostic-based Utility for Mining MEDLINE’ (DUMMY), a custom-developed application for mining/analyzing PubMed/MEDLINE citations, we created a database of randomized trials in PubMed (Publication Type: ‘Randomized Controlled Trial’). Combinations of individual words and two-word arrangements with a prevalence of >5% in the title/abstract were tabulated. We generated a second database from 10 Cochrane reviews (214 included trials) for further benchmarking. Two new DUMMY-based search strategies were developed: (a) sensitivity-maximizing, and (b) sensitivity/precision-maximizing. The performance of each search strategy was measured against modified versions (excluding MeSH terms) of the Cochrane Highly Sensitive Search Strategy (HSSS) (sensitivity-maximizing and sensitivity/precision-maximizing versions). Sensitivity, specificity, positive (PPV) and negative (NPV) predictive values were calculated.

Results: A PubMed search conducted in April, 2012 generated 341 750 included citations. Permutations of common unique word/two-word combination in the title (n = 24/n = 12) and abstract (n = 404/n = 193) were tested for diagnostic accuracy. Both novel DUMMY-based strategies had superior sensitivity to their HSSS counterparts, but also had lower PPVs (6.5% vs. 16.5% and 36% vs.44.1%, respectively); NPVs for all strategies were similar. Superior sensitivity of the proposed DUMMY-based strategies was confirmed against the Cochrane review dataset (93% vs. 81% and 81% vs. 75%, respectively).

Conclusions: The DUMMY-based sensitivity-maximizing search strategy markedly increases the probability of identifying randomized trials in MEDLINE. The sensitivity/precision-maximizing strategy demonstrated excellent sensitivity and specificity in one algorithm. Further refinement of the search strategies and appropriate validation is underway to determine the ‘real-life’ impact of these strategies.
Images