'Clustering' documents automatically to support scoping reviews of research

Article type
Authors
Thomas J1, Stansfield C2, Kavanagh J3
1Institute of Education; EPPI-Centre, London; UK
2Social Science Research Unit, Institute of Education, London, UK
3Social Science Research Unit, EPPI- Centre, London, UK
Abstract
Background: Scoping reviews differ from systematic reviews in that they are limited to a preliminary assessment of the potential scope and size of the relevant research literature, and indicate potential research gaps. They are often conducted on the title and abstract level, and describe different aspects of the research literature, for example, study design, population, intervention elements and outcomes. The EPPI-Centre conducted two such scoping reviews for the Department of Health in England. Objectives: To assess the value of automated clustering using text mining to provide rapid descriptive codes across a range of study characteristics in order to generate themes across research records. Methods: Studies included in two public health-related systematic scoping reviews that have previously been coded by researchers across a range of characteristics at the title and abstract level were uploaded to the text mining software. The clustering function was employed to generate descriptive codes for all research records. Comparisons were made between the automated and researcher-assigned codes. Results: Preliminary results indicate clustering can be useful in describing literature for scoping reviews thematically in terms of topic focus, population groups and well-described study designs. Researcher input and knowledge of the literature enhances the usefulness in application. Optimal use of the function is a combination of researcher checking and reassigning ambiguous codes, variation of the minimum word length for clustering (longer descriptive labels result in smaller, more focused clusters) and undertaking manual free-text searches in conjunction with cluster-assigned codes. Conclusions: Clustering is a promising tool in contributing to rapid scoping reviews, although there are limitations owing to the diverse terminology used in describing public health literature.