Application of text mining and machine learning for problem formulation in systematic reviews

2015 Vienna

Thayer K¹, Howard B², Holmgren S¹, Pelch K¹, Walker V¹, Lunn R¹, Shah R²

¹National Institute of Environmental Health Sciences (NIEHS)/National Institutes of Health (NIH), USA

²SciOme LLC, USA

Background: Identifying addressable questions for systematic reviews can be a challenge, especially in environmental health where evidence from human, animal, and in vitro studies is often integrated in assessments. Text-mining and machine learning tools hold promise to help with problem formulation.
Objectives: To explore the utility of using the Sciome Workbench for Interactive, Computer-Facilitated Text-mining (SWIFT) software to visualize literature search results for three complex topics: research trends for ~500 endocrine-disrupting chemicals; environmental influences on the epigenome; and health effects associated with night shift work, light at night, or circadian disruption.
Methods: Literature search results from PubMed were uploaded into SWIFT for each project. Customized search strategies were developed for evidence stream (i.e. human, animal, in vitro), exposure, and health outcome. The unsupervised topic clustering functionality of SWIFT was used to group articles by subject matter. Users created intersections of various tags to focus on specific topics, e.g. night shift work and metabolic disorders. Together these functions were used to create interactive reports.
Results: The interactive, visual reports produced by SWIFT allowed users to identify and formulate focused research questions more efficiently. The reports helped identify topics that have been extensively studied, as well as emerging areas of research (Figure 1). Viewing results by evidence stream helped users determine how much evidence integration might be required (Figure 2). The topic clustering results are also used to identify 'seed studies' for the purpose of training a machine-learning model that priority ranks relevant studies in focused areas.
Conclusions: Text-mining and machine learning programs such as SWIFT are valuable tools for problem formulation. These types of analyses could be considered for a type of scoping review that can be used for various purposes, ranging from showing trends in research, to identifying targeted questions that could be addressed in systematic reviews.