Using text-mining to develop a U.S.-specific geographic search filter to facilitate systematic reviews in Ovid MEDLINE

Article type
Authors
Cheung A1, Popoff E1, Szabo SM1
1Broadstreet Health Economics and Outcomes Research
Abstract
Background: Bibliographic databases like MEDLINE are crucial for healthcare researchers to access the latest evidence. As such databases index an ever-increasing volume of research, tools supporting information retrieval are valuable for identifying relevant evidence efficiently. Geographic search filters have been developed for jurisdictions such as the United Kingdom, Spain, and Africa.

Objectives: This study aimed to develop a geographic search filter for accurately identifying research from the United States (U.S.) in Ovid MEDLINE.

Methods: Citations indexed in MEDLINE were collected from bibliographies of reviews by the U.S. Preventive Services Task Force, which publishes evidence-based recommendations in various disease areas. An algorithm was developed to select U.S.-based publications meeting ≥2 of the following 3 criteria – U.S.-based: author affiliation, place of publishing, or grant funding. Using text mining, one- and two-word terms in title/abstract fields were identified, and the frequency compared between U.S. and non-U.S. citations. The findings were used to develop a preliminary search filter. Analyses were performed in R.

Results: 22,280 citations were collected, of which 8,243 were U.S.-based according to the algorithm. U.S. citations were published between 1980-2019; therapeutic areas included cardiovascular disease (9.9%), obesity (6.5%), and HIV infection (5.0%). Common U.S.-related terms included (expressed as ratio of frequency in U.S. to non-U.S. citations) U.S. cities/states/regions (“Pennsylvania” (64.0), “Miami” (26.3), “midwest” (23.0)), and words related to U.S. populations (“African American” (22.2), “Medicare beneficiaries” (14.0)). The search filter was developed by combining these and other key terms in title/abstract fields (Table 1).

Conclusions: This development of a MEDLINE-based search filter will streamline the systematic identification of evidence from U.S. studies. As the validity of the filter would be impacted by changes in controlled vocabulary in MEDLINE, periodic updates will be necessary. The above algorithm assumes that publications meeting at least 2/3 of the stated criteria are U.S.-based; although this was not formally tested, an audit confirmed the relevance of select publications. Future work will include validation of the filter and refinement to improve sensitivity/specificity.

Patient or healthcare consumer involvement: This search filter will allow healthcare researchers to access information driving evidence-based decision making in a more targeted and efficient manner.