Article type
Year
Abstract
Background:
No abstract classifier can be used for new diagnostic test accuracy (DTA) systematic reviews to select primary DTA study abstracts from database searches.
Objectives:
Our goal with the FILtering of diagnostic Test accuracy studies (FILTER) Challenge was to develop machine learning (ML) filters for new diagnostic test accuracy (DTA) systematic reviews through an open competition.
Methods:
We conducted an open competition. We prepared a dataset including titles, abstracts, and the judgement sought to retrieve full texts from 10 DTA reviews and a mapping review. We randomly split the datasets into a train set (n = 27145, labeled as DTA n= 632), a public test set (n = 20417, labeled as DTA n= 474), and a private test set (n = 20417, labeled as DTA n= 469). Competition participants used the training set to develop models, then they validated their models using the public test set to refine their development process. Finally, we used the private test set to rank the submitted models. We used Fbeta with beta set to seven to honor models. For the external validation, we used a DTA review for the cardiology dataset (n = 7722, labeled as DTA n= 167). We preset Fbeta adopted seven as the value of beta and recall to evaluate models for evaluating better filters that are less likely to miss.
Results:
From July 28 to October 4, 2021, we held the challenge. We received a total of 13,774 submissions from 1,429 teams or persons. We honored the top three models. Fbeta scores and Recall in the external validation set were 0.4036 and 0.2352 by the first model, 0.3262 and 0.3313 by the second model, and 0.3891 and 0.3976 by the third model, respectively.
Conclusions:
We were unable to develop a search filter with sufficient recall to apply for new DTA reviews immediately. Further studies are needed to update and validate filters with datasets from other clinical areas.
Patient, public, and/or healthcare consumer involvement: None.
No abstract classifier can be used for new diagnostic test accuracy (DTA) systematic reviews to select primary DTA study abstracts from database searches.
Objectives:
Our goal with the FILtering of diagnostic Test accuracy studies (FILTER) Challenge was to develop machine learning (ML) filters for new diagnostic test accuracy (DTA) systematic reviews through an open competition.
Methods:
We conducted an open competition. We prepared a dataset including titles, abstracts, and the judgement sought to retrieve full texts from 10 DTA reviews and a mapping review. We randomly split the datasets into a train set (n = 27145, labeled as DTA n= 632), a public test set (n = 20417, labeled as DTA n= 474), and a private test set (n = 20417, labeled as DTA n= 469). Competition participants used the training set to develop models, then they validated their models using the public test set to refine their development process. Finally, we used the private test set to rank the submitted models. We used Fbeta with beta set to seven to honor models. For the external validation, we used a DTA review for the cardiology dataset (n = 7722, labeled as DTA n= 167). We preset Fbeta adopted seven as the value of beta and recall to evaluate models for evaluating better filters that are less likely to miss.
Results:
From July 28 to October 4, 2021, we held the challenge. We received a total of 13,774 submissions from 1,429 teams or persons. We honored the top three models. Fbeta scores and Recall in the external validation set were 0.4036 and 0.2352 by the first model, 0.3262 and 0.3313 by the second model, and 0.3891 and 0.3976 by the third model, respectively.
Conclusions:
We were unable to develop a search filter with sufficient recall to apply for new DTA reviews immediately. Further studies are needed to update and validate filters with datasets from other clinical areas.
Patient, public, and/or healthcare consumer involvement: None.