De-duplication: new methods to reduce the manual burden of systematic searches

Tags: Oral
Tuvey D1, Walsh N1
1NICE

Background:

Systematic searches of multiple bibliographic databases and other sources are undertaken to conduct systematic reviews and develop health guidance. This approach results in numerous duplicate citations, which are time-consuming to identify and remove. The National Institute of Health and Care Excellence (NICE) uses a tool called EPPI-Reviewer to manage, sift and source evidence as part of the systematic review process. In 2016, NICE partnered with University College London to develop EPPI-Reviewer version 5 (ER5), one objective being to identify performance improvements to ER5's de-duplication algorithm.

We will share the methods used to test and evaluate the de-duplication algorithm of ER5 against our current reference management tool, Endnote. We will also present performance results of the algorithms. The methods used to evaluate the ER5 de-duplication algorithm are transferable to anyone wishing to evaluate the de-duplication performance of their reference management software.

Objectives:

To describe the methodology for evaluating de-duplication algorithms.

Methods:

To accurately assess the performance of the algorithms, we created a gold standard dataset using citations identified from a previous literature search. We manually reviewed and coded each record as either duplicate or unique. We added additional metadata to duplicate records to identify and group further duplicates of the same record. The presentation will explain how attendees can replicate this process with their own data.

Results:

We will present the comparative performance of each algorithm using sensitivity and specificity data. This will include a discussion of the iterations of each algorithm tested and the characteristics of the duplicate records that led to the modifications.

Conclusions:

We have developed new methods to evaluate the performance of de-duplication algorithms. This research methodology will contribute to the body of knowledge on improving de-duplication algorithms of reference management software.

Patient or healthcare consumer involvement:

To delete a relevant record would mean that evidence for guidance was missed, which could impact on patient care. An improved de-duplication function in ER5 would ensure that no relevant records were deleted during the de-duplication process and would benefit all users.