Shortening the pipeline: the use of data mining to link new trials to Cochrane Reviews

Article type
Authors
McDonald S1, Thomas J2, Wallace S3, Elliott J4
1Australasian Cochrane Centre, Monash University, Australia
2EPPI-Centre, University of London, UK
3Cochrane Incontinence Group, University of Aberdeen, UK
4Monash University, Australia
Abstract
Background: It’s estimated that at least 500 reports of trials are published every week. The pipeline by which these trials find their way into existing Cochrane Reviews (often via the Cochrane Central Register of Controlled Trials and the specialised registers of Cochrane Review Groups (CRGs)) can be lengthy and inefficient. Data mining offers the prospect of the automated distribution of trials upon publication in major databases (e.g. PubMed). The advent of the Central Register of Studies makes it possible to test the feasibility of data mining approaches.

Objectives: To test the feasibility and accuracy of introducing an automated process for assigning newly published trials to (1) the relevant Cochrane Review Group, and (2) the relevant Cochrane Review(s) using data mining approaches.

Methods: Using the Central Register of Studies, we created a dataset consisting of the titles, abstracts and keywords (where available) of reports of all included studies in all Cochrane Reviews. We then trained a classifier using the LibSVM Support Vector Machine based on the included studies of reviews from the Cochrane Incontinence Group. Over a 3 month period we automatically identified potentially relevant studies for that Group that had been published on PubMed.

Results: The studies identified using the datamining system were compared with those which the Group identified using its standard practices. Metrics of sensitivity and precision were calculated, as well as yield and burden of the datamining system.

Conclusions: The usefulness of Cochrane Reviews and the efficiency of the review process would be improved if relevant reports of trials could be identified and linked to reviews on a prospective basis using automated approaches, such as data mining. This system demonstrates the potential of such technologies, though further work will be needed in order to optimise its precision.