Machine learning for identifying randomized controlled trails in systematic reviews: an improvement and a practical evaluation in assisting human

2024 Prague [Global Evidence Summit]

Qin X¹, Yao M¹, Li L¹, Sun X¹

¹West China Hospital, Sichuan University, Chengdu, Sichuan province, China

"Background:
While systematic reviews(SRs) based on Randomized Controlled Trials (RCTs) provide valuable evidence for decision-making, SRs are generally time-consuming . Machine learning (ML) models have developed to identify RCTs but are not used much in practice, in part due to limited performance and unclear practical effect.
Objectives:
To examine the effect of an ensemble learning in assisting rapid title and abstract screening when conducting systematic reviews.
Methods:
Using the Cochrane RCT data, we trained and tested an ensemble learning model for rapid title and abstract screening when conducting a SR. The model is a light gradient boosting(LightGBM) machine integrating four different pre-trained bidirectional encoder representation(BERT) models. Especially, we directly assessed the practical performances of the model at labor time saving and recall improvement on our annotated RCT datasets in two potential scenarios, respectively. The scenarios are ML-assisted parallel screening (ML and manual screening are performed in parallel) and ML-assisted stepwise screening (First ML screens, then the human screens the citations included by the ML).
Results:
Our model achieved better performance than existed high-recall SVM model, at saving more workload (average 75.2% vs 51.8%) under similar high recall (average 0.996 vs 0.994) on both internal and external test. In practical evaluation, when ML-assisted parallel screening, our model showed more labor time saving (average 43.66% vs. standard manual screening) and better recall (average 0.998 vs 0.916 single reviewer). When ML-assisted stepwise screening, our model yielded performance similar to standard manual screening (recall of average 0.994 and specificity of average 1.000), with substantial labor time saving at average 75.15%.
Conclusions:
In this study, we proposed an ensemble learning model, which can reduce more workload under similar high recall compared with the exist models, in identifying RCTs during title and abstract screening to facilitate rapid conducting SRs. We suggest that when reviewer is limited in producing SRs, ML-assisted parallel screening may be promising solution and when time is limited, ML-assisted stepwise screening may be helpful.
"