Automatic information retrieval: citation tracking, deduplication and full-text fetching

Tags: Oral
Tsafnat G1, Choong M1
1University of New South Wales, Centre for Health Informatics, Australia

Background:

In evidence-based medicine, systematic reviews are important devices for medical practitioners in making clinical decisions. A systematic review is a summary of evidence on a clearly formulated question based mostly on randomized controlled trials. Development of a systematic review is a slow and complex process and can take between one to two years to complete. Updates are costly in terms of time and money, and can take as much time and effort as the original systematic review.

Objectives:

To support new evidence from the conducting and updating of systematic reviews using a software tool that automatically extracts citation information, retrieves citation data and full text of scientific papers.

Methods:

We developed gold standards by manually extracting citations from systematic reviews and using search results provided by Cochrane authors. We measured our tool's capacity to extract citations from PDFs, retrieve full text or abstracts, duplicate citations and extract citations from the full text.

Results:

We have developed ESuRFr, a tool that automatically extracts citation information, retrieves citation data and retrieves the full text of scientific papers. ESuRFr extracted 1,020 (96.5%) citations from 20 review papers. ESuRFr achieved F1-score of 0.912 for citation retrieved and 0.797 for abstract/full text fetched of the available citations. ESuRFr identified 100% (n=17) duplicates but identified two citations as duplicates erroneously. ESuRFr automatically retrieved 86% (n=12) of 17 papers included in a systematic review update by tracking two iterations of citations from the original review.

Conclusions:

ESuRFr automates new evidence, duplication of records from different databases and automatic citation tracking to identify relevant literature for systematic reviews and systematic review updates.