Word frequency analysis of over a million words to support the development of search strategies on 'health-related values'

Article type
Authors
Petrorva M1, Sutcliffe P2, Fulford B3, Dale J2
1Egenis (ESRC Centre for Genomics in Society), University of Exeter, Exeter, UK and Health Sciences Research Institute, Warwick Medical School, University of Warwick, Coventry, UK
2Health Sciences Research Institute, Warwick Medical School, University of Warwick, Coventry, UK
3Institute of Clinical Education, Warwick Medical School, University of Warwick, Coventry, UK
Abstract
Background: Systematic reviews increasingly incorporate or are complemented with findings from research on 'values', understood broadly to include ethical values, beliefs, preferences, experiences, satisfaction, quality of life, etc. Research on such issues is scattered. The vocabulary for and the boundaries of such topics are also notoriously contentious.

Objectives: This study used word frequency analysis to 1) generate a broad pool of search terms, along with data on their precision and sensitivity, to support systematic review searches; and 2) develop a 'brief values filter’ for scoping searches.

Methods: Datasets of MEDLINE records on Diabetes, Obesity, Dementia and Schizophrenia were used (2004- 2006; 4,440 citations; 1,110,291 words). Word frequency analysis was performed using Concordance® and SPSS. Text words and MeSH terms of high frequency and high precision were compiled into a search filter. It was validated on Dentistry and Food Hypersensitivity.

Results: 144 unique text words and 124 unique MeSH terms of moderate and high frequency (≥20) and very high precision (≥90%) were identified. 19 text words and 7 MeSH terms had such excellent performance parameters across at least three topics. These were compiled into a brief values filter. In the derivation dataset, it had sensitivity of 76.8% and precision of 86.8%. In the validation datasets, its sensitivity and precision were, respectively, 70.1% and 63.6% (Food Hypersensitivity) and 47.1% and 82.6% (Dentistry).

Conclusions: Both 'health-related values' and word frequency analysis-based approaches to search filter development are areas of substantial potential. A conceptualisation in terms of health-related values may underpin a systematic and coherent picture of the psychological, social, cultural, ethical and political factors associated with a particular healthcare concern. Word frequency analysis need not only precede search strategy development: the ideal sample for it would comprise publications from completed reviews, with findings supporting search strategy improvements for subsequent reviews.