Data reuse, machine learning, and crowdsourcing in Screen4Me: how screening burden can be reduced substantially and reliably

Article type
Authors
Thomas J1, Noel-Storr A2, McDonald S3, Marshall IJ4
1EPPI-Centre, University College London
2University of Oxford
3Cochrane Australia
4Kings College London
Abstract
Background: Cochrane authors sift through millions of bibliographic references to find studies for inclusion in systematic reviews. The same records are retrieved and examined by different teams, duplicating effort which, up until now, has been impossible to avoid.

For reviews where only randomized controlled trials (RCTs) will be included, significant screening burden would be saved if authors needed to examine RCTs only, rather than all the studies, retrieved in their search.

Objectives: to build a service which takes as its input the bibliographic records retrieved from standard searches for systematic reviews, giving as its output the subset of those records that describe randomized trials.

Methods: three ‘technologies’ were combined to build the service.

1) Existing data: the Cochrane Register of Studies contains hundreds of thousands of records with accurate labels denoting whether each record does, or does not, describe an RCT. If a record that is input into the Screen4Me service matches one of these records, then that label can be reused.
2) Machine learning: a classifier was built to distinguish between RCTs and non-RCTs. Classifiers can be calibrated to prioritize recall or precision, and the Information Retrieval Methods Group (IRMG) was consulted to determine an acceptable threshold.
3) Crowdsourcing: the Cochrane Crowd is a citizen science platform that hosts ‘micro-tasks’ allowing a wide range of people to contribute to Cochrane Reviews. An algorithm operates to ensure that classifications are reliable. One micro-task is RCT identification: a title and abstract is presented to a Crowd contributor and he/she determines whether the record describes an RCT.

Results: existing data: new records were found to match accurately with existing data using IDs. An examination of 11 review updates from September 2016 found that the database contained 62% to 98% of records retrieved.

Machine learning: IRMG specified that the machine learning classifier should achieve a 99% recall for RCTs in the McMaster ‘Hedges’ dataset. When tested against all RCTs in Cochrane Reviews (> 92,000), it achieved a 99.6% recall.

Cochrane Crowd performance on the RCT task was evaluated using 6041 records that were also independently classified by two expert screeners. The Crowd achieved 99.1% sensitivity and 99.0% specificity.

The three technologies are now deployed in the Cochrane Register of Studies (CRS). Searches are first matched against existing data, and unmatched records are then passed through the RCT Classifier, and records which are highly unlikely to be RCTs are discarded. The remaining records can be sent to Cochrane Crowd. Therefore, this workflow yields as an output only the RCTs retrieved in searches for authors to assess.

Conclusions: the Screen4Me service, launched in April 2019, is a reliable and efficient workflow for RCT identification.