Crowd-sourcing structured metadata to improve literature search efficiency

Article type
Authors
Kaiser K1, Sweeney H1, Brown A2
1School of Public Health, University of Alabama at Birmingham
2School of Public Health-Bloomington, Indiana University- Bloomington
Abstract
Background:
Scientific literature search tools are often designed to enable presentation of matches based on a relevance model of information retrieval. In the case of MEDLINE the medical subject headings and keywords system was implemented in the 1960s when there were around 100,000 articles in MEDLINE. Now, > 1 million articles are added each year, with an accumulation now of > 40 million. The relevance indexing approach results in more than 97% of retrieved articles for systematic reviews (SRs) being non-relevant.

Objectives:
We aimed to quantify the percentage improvement in the identification of studies by creating structured metadata for two components that are the most common reasons why studies are excluded from systematic reviews: population and study design.

Methods:
We created a web-based portal that allows for crowd-sourcing of structured metadata. Twelve questions were answered by at least two coders to describe the study characteristics of one year's worth of articles published in a single journal, Obesity (year: 2016; N = 365, including a wide array of study types and designs from human epidemiological studies to basic science articles with a variety of species as the focus of study). Once created, we applied structured queries to identify articles with different designs and populations, and compared the search precision to the standard methods available in PubMed.

Results:
Many articles could be coded in less than 10 minutes by people with little training or expertise using this approach. Using a context-based method, search precision was increased by as much as 50% when compared to the standard approach (which had 19% false negatives) over the study categories tested.

Conclusions:
We propose a transition to all articles having structured metadata descriptors available upon publication, with expansion to the full PICOS structure (Population, Intervention, Comparison, Outcomes, Study Design). Our tests show that full metadata can be provided in under 15 minutes. With this persistent metadata, all consumers of scientific literature could search with significantly higher precision and efficiency.

Patient/healthcare consumer involvement:
Lay people can contribute to the improved efficiency of literature searches and may also be more easily able to curate personal databases of interest when SRs are lacking or out of date.