Article type
Abstract
Background: In health-related topics, Publication Types and study designs (collectively, PTs) are important metadata for filtering biomedical literature for evidence synthesis and guideline development. While MEDLINE and other databases index many PTs, not all biomedical literature is indexed. Existing indexing is largely manual, inconsistent across databases, and has errors, so that in practice, evidence synthesis teams cannot currently rely on indexing to include or filter out articles. Consistent, automated, comprehensive indexing of PTs could help more effectively retrieve all relevant articles while filtering out irrelevant types of articles.
In prior work, our team developed a machine learning model called Multi-Tagger 1.0 which estimates the probability that a given biomedical article belongs to any of 50 established PTs (these include Randomized Controlled Trial, systematic review, cohort studies, case reports, prospective studies, and many more). Multi-Tagger 1.0 data are publicly available for PubMed articles for download and via queries to the Anne O’Tate website (https://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/AnneOTate.cgi).
Objectives: Our three-year NIH-funded project aims to create an improved, more comprehensive machine learning model Multi-Tagger 2.0 based on deep learning, which will provide stakeholders with highly accurate predictive scores of the PTs for biomedical articles, preprints, and manuscripts. We also hope to work with stakeholders to refine definitions of existing PTs to fit their needs, as well as adding PTs that do not currently have formal indexing (e.g. animal models of human disease, diagnostic test accuracy, or studies of drug-drug interactions).
Methods: Our project seeks to collaborate widely to develop use cases, validation scenarios, and opportunities for dissemination. We are seeking partners who can test new models’ suitability for specific information retrieval tasks in clinical medicine, veterinary medicine, preclinical studies, and other health-related topics. We especially seek to partner with database teams (PubMed and beyond) and with evidence synthesis and guideline development teams.
Conclusions: Efficient retrieval of biomedical literature based on automated indexing of Publication Types and study designs (collectively, PTs) has the potential to improve the quality and the speed of evidence synthesis and guideline development.
Acknowledgements: This work is supported by National Institutes of Health (NIH)/National Library of Medicine R01LM010817 and R01LM014292.
In prior work, our team developed a machine learning model called Multi-Tagger 1.0 which estimates the probability that a given biomedical article belongs to any of 50 established PTs (these include Randomized Controlled Trial, systematic review, cohort studies, case reports, prospective studies, and many more). Multi-Tagger 1.0 data are publicly available for PubMed articles for download and via queries to the Anne O’Tate website (https://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/AnneOTate.cgi).
Objectives: Our three-year NIH-funded project aims to create an improved, more comprehensive machine learning model Multi-Tagger 2.0 based on deep learning, which will provide stakeholders with highly accurate predictive scores of the PTs for biomedical articles, preprints, and manuscripts. We also hope to work with stakeholders to refine definitions of existing PTs to fit their needs, as well as adding PTs that do not currently have formal indexing (e.g. animal models of human disease, diagnostic test accuracy, or studies of drug-drug interactions).
Methods: Our project seeks to collaborate widely to develop use cases, validation scenarios, and opportunities for dissemination. We are seeking partners who can test new models’ suitability for specific information retrieval tasks in clinical medicine, veterinary medicine, preclinical studies, and other health-related topics. We especially seek to partner with database teams (PubMed and beyond) and with evidence synthesis and guideline development teams.
Conclusions: Efficient retrieval of biomedical literature based on automated indexing of Publication Types and study designs (collectively, PTs) has the potential to improve the quality and the speed of evidence synthesis and guideline development.
Acknowledgements: This work is supported by National Institutes of Health (NIH)/National Library of Medicine R01LM010817 and R01LM014292.