Using text mining technologies can reduce screening workload in systematic reviews in practice as well as in theory

Article type
Authors
Thomas J1, Ananiadou S2, Brunton J1, McNaught J2, Miwa M2, O'Mara-Eves A1
1EPPI-Centre, Institute of Education, London, United Kingdom
2University of Manchester, United Kingdom
Abstract
Background:
The task of identifying relevant studies for systematic reviews in an unbiased way is increasingly time consuming. Text mining may be able to assist in the screening process by prioritizing the list of items for manual screening so that the studies at the top of the list are those that are most likely to be relevant ('screening prioritization'). This can assist in the review by enabling reviewers to move to full-text screening (and later stages) earlier than they might otherwise have done; they may also not need to look at every retrieved reference.

Objectives:
To evaluate the performance of text mining methods to reduce screening workload by assessing their performance in a) completed reviews and b) on-going reviews. Also, to evaluate implementation issues and user experiences through a process of evaluation.

Methods:
Simulation studies were conducted on eight completed systematic reviews in public health and clinical areas. Performance was assessed using accuracy, precision, recall/sensitivity, F-measures, and Area Under a (ROC) Curve and burden. Additionally, screening prioritization was used in two large systematic reviews during the actual process of screening. A process evaluation assessed the implementation of the methods and user experiences of the system. Retrospective simulation studies of these datasets were also conducted.

Results:
Text mining methods successfully ‘pull’ the relevant studies towards the beginning of the screening process; and this can apply in complex areas such as public health (though more complex tools are required) as well as the more clinical topics. The process evaluation shows that the performance of these tools in ‘live’ reviews may not reach the levels achieved in simulation studies for operational reasons.

Conclusions:
Simulation studies show that text mining technologies are able to ‘prioritize’ relevant items for early screening in reviews. The use of these technologies in ‘live’ reviews requires further work, since the practicalities of undertaking screening in large reviews with potentially many reviewers may prevent optimal performance.