A comparison of duplicate detection automation tools: a head-to-head comparison study

Article type
Clark J1, Bateup S2, Fulbright H3, Forbes C1, Gruber S4, Hair K5, Qureshi R6, Stansfield C7, Steel P8, Thomas J7
1Institute for Evidence-Based Healthcare, Bond University
2Bond University Library, Bond University
3Centre for Reviews and Dissemination, University of York
4 PICO Portal
5CAMARADES, University of Edinburgh
6Anschutz Medical Campus, University of Colorado
7University College London
8Haskayne School of Business, University of Calgary
A key task when conducting a systematic review is to identify and remove duplicate records retrieved by a literature search across multiple databases, a process referred to as deduplication. Deduplication is time-consuming and error prone, particularly when processing thousands of references from multiple sources. Some approaches use automation combined with manual checking by humans and might be done using reference management software or bespoke deduplication tools that are available, either standalone or within systematic review software. Some are only accessible through expensive, proprietary software or operate in a “black box” environment. It is not known how these tools compare against each other and which performs best to minimise errors and reduce the time spent deduplicating.

We are evaluating how the eight bespoke deduplication tools perform to inform choices about which to use. We are evaluating the following tools: 1) Covidence; 2) EPPI-Reviewer; 3) the Deduplicator; 4) Rayyan; 5) PICO Portal; 6) Deduclick; 7) ASySD; and 8) HubMeta Deduplicator.

Our sample set comprises re-run searches from a random selection of 10 Cochrane reviews published between 2020 and 2022. We will independently deduplicate these with two experienced information specialists to create 10 deduplicated gold standard sets. Each of the sets will be deduplicated using each tool under investigation and compared with the gold standard sets. The following outcomes will be measured:
1. Unique references removed
2. Duplicates missed
3. Additional duplicates identified
4. Time required to deduplicate
5. Qualitative analysis of unexpected findings of interest

Early testing suggests that the automation tools that comprise a human checking component produce fewer errors than those that are fully automated. The majority of these errors are missed duplicates, with few unique references removed. The tools that comprise a human checking component do require more time to deduplicate records sets. We expect to present the error rates of each tool and processing time.

Our conclusions will be presented at the conference.

Patient, public and/or healthcare consumer involvement:
Although not directly relevant to patients, this study will help patients by contributing to methods that will result in more robust and efficient evidence production.