Integration of Machine Learning in a living systematic review of baseline risks of Venous Thromboembolism complications in hospitalized patients with COVID-19

2023 London

Lotfi T¹, Nowak A², Charied R¹, Solo K¹, Santesso N¹, Schünemann H¹, Nieuwlaat R¹

¹Michael G. DeGroote Cochrane Canada Centre, Health Research Methods, Evidence & Impact, McMaster University

²Evidence Prime

Background: Living systematic reviews (LSRs) of prognostic studies rely on screening many observational studies that are not clearly labeled.

Objectives:
To assess the performance of a machine learning (ML) classifier for screening in an LSR for venous thromboembolism (VTE)–related outcomes and baseline risks in patients with COVID-19.

Methods:
As part of a guideline development project for the American Society of Hematology (ASH) on the use of anticoagulation for thromboprophylaxis in patients with COVID-19, the team conducted an LSR to establish and maintain relevance of the baseline risk for VTE-related outcomes. The search was conducted in September 2020 (baseline search) and updated monthly until July 2021. At baseline, the search identified12,566 citations. The team trained an ML classifier using the manual screening of the baseline search to partially automate the screening process for the next search iterations. The classifier ranked captured citations based on likelihood for inclusion, with those appearing on top as most likely to be included. The classifier was integrated in a new software, “Laser AI,” which would allow the team to screen prioritized citations in future updates.
The objective of this study was to assess the performance of the classifier for VTE as outcome and the efficiency in using this classifier. We screened manually, in duplicate and independently, a sample of 5% (n=3,478) of captured citations at the second update of the living search that were not prioritized by the classifier. We then combined this sample with the top 5,000 that were screened by the LSR team and reassessed the classifier's performance. Additionally, we explored the correlation between risk of bias using the QUIPS tool and the scores allocated by the classifier.

Results:
Out of 3,478 citations, 377 were included at title/abstract level and none addressed VTE at full text level. The sample guided by ML classifier was reliable and did not compromise the screening results for VTE outcome for the LSR. For the lowest prediction threshold that was allocated to the sample of 5,000 citations that the original LSR team screened, the recall-precision trade-off was satisfying. The time saved from manually screening the titles and abstracts of all the citations at the second update was 214 working days of 7 hours. When exploring the correlation between the risk of bias and the scores allocated by the classifier, we found that studies with low RoB scored highest.

Conclusions: The efficiency and relevance of LSRs for prognostic studies can be enhanced when combining manual with ML-directed screening.

Patient, public and/or healthcare consumer involvement: N/A.