Evidence collection during disease outbreaks - Living systematic reviews

Article type
Authors
Counotte MJ1, Imeri H1, Meili K2, Low N1
1Institute of Social and Preventive Medicine, University of Bern, Bern
2Department of Epidemiology and Global Health, Umeå University, Umeå
Abstract
Background: Early in disease outbreaks, evidence is often scarce but accumulates rapidly. Living systematic reviews (LSR) are systematic reviews that are updated as soon as new information becomes available and provide a solution to keep up with the evidence. However, LSRs in an emerging outbreak setting face unique challenges compared to other LSRs: Relevant information first becomes available in preprint publications, that are replaced by their peer-reviewed version and as outbreaks emerge, search terms may change.

Objectives: Here, we describe how we built an LSR system to cope with the rapidly evolving evidence of emerging outbreaks. We describe the challenges we faced conducting LSRs during the Zika virus and the COVID-19 outbreak. We focus on the methods used to retrieve unique citations from different sources and creating updatable data output.

Methods: We use application programming interfaces (API) to collect citation data from the preprint servers BioRxiv and MedRxiv, and from the medical bibliographic databases EMBASE, PubMed (Figure 1). We verify and clean the data. We apply deduplication algorithms to retain unique citations. We compared a rule-based algorithm using similarity scores with thresholds, a logistic regression model predicting duplicate status and a blended approach, where both were combined. We calculate similarity scores between titles, authors, journal names and other properties, using Longest Common Subsequence and other similarity indices. In a test-set of 2500 records from EMBASE and PubMed, with 220 duplicate and 2280 non-duplicate pairs, both the logistic regression prediction and the blended approach had a sensitivity of 100% and a specificity of 100%. Unique citations are retained and imported into a central database in ‘Research Electronic Data Capture’ (REDCap). Although primarily designed for data collection for clinical trials, the database allows tracking changes, has a conflict-resolution workflow, and allows API access. The database allows flexible data output, formatted as Research Information Systems (RIS) or Extensible Markup Language (XML), compatible with all citation managers.

Results: The LSR system allows us to optimize our workflow. We receive daily new deduplicated citations from different sources. Using a central database, in which we screen and extract data, we can create dynamic content that allows rapid updating of figures, tables and other output. Indexed citations are distributed for screening and verification to a crowd; online tools are used to allow rapid screening and verification.

Conclusions: LSRs in an emerging outbreak allow us to keep up with the evidence, however, poses unique challenges.