Accuracy of the PHQ-9 for screening to detect major depression: an updated systematic review and meta-analysis of individual participant data

Article type
Authors
Negeri ZF1, Levis B2, Benedetti A1, Thombs BD2, DEPRESSD PHQ Collaboration N3
1Department of Epidemiology, Biostatistics and Occupational Health, McGill University
2Lady Davis Institute for Medical Research, Jewish General Hospital
3NA
Abstract
Background: Depression accounts for more years of “healthy” life lost than any other medical condition. Major depressive disorder (MDD) is present in 5-10% of primary care patients and 10-20% of patients with chronic medical conditions. The Patient Health Questionnaire-9 (PHQ-9) is the most commonly used tool for screening major depression. Objectives: This study aimed to determine the accuracy of the PHQ-9 for detecting major depression. Methods: Individual participant data meta-analysis (IPDMA) was used to synthesize primary data obtained from several search engines (January 2000-May 2018). A bivariate generalized linear mixed-effects model was employed to estimate overall sensitivity and specificity for PHQ-9 cut-off scores 5-15, separately, among studies that used semi-structured, fully structured, and the Mini International Neuropsychiatric (MINI) diagnostic interviews. Meta-regression was used to examine potential associations between participant characteristics and the accuracy of the PHQ-9. Results: Data were obtained from 100 of 123 eligible studies (81%), for a total of 44,503 participants and 4,541 major depression cases. Sensitivity and specificity were maximized at a cut-off score of 10 or above among studies using a semi-structured interview (47 studies, 11,234 participants; Sensitivity=0.85, 95% confidence interval [CI]: 0.79 to 0.89; Specificity=0.85, 95% CI: 0.82 to 0.87), and a cut-off score of 8 or above among studies using both the fully structured and MINI interviews. The sensitivity (95% CI), specificity (95% CI) of the PHQ-9 at cut-off score of 8 among fully structured and MINI interviews were 0.77 (0.66, 0.86), 0.81 (0.74, 0.86) and 0.85 (0.79, 0.89), 0.8 (0.76, 0.83), respectively. Meta-regression showed that the age and sex of participants were significantly associated with the specificity of the PHQ-9. The median specificity for older participants was greater by 2% to 6% across cutoffs compared to younger participants across reference standards. The median specificity of the PHQ-9 for female participants was less than for male participants by 3% and 4% for the fully and semi-structured interviews, respectively. Conclusions: The diagnostic accuracy of the PHQ-9 is higher for semi-structured interviews compared to fully structured and MINI interviews. Cutoff scores of 10 for semi-structured and 8 for fully and MINI studies, yielded optimal sensitivity and specificity. Older age was associated with higher specificity for all three reference standard categories, and female participants tend to have lower specificity compared to male participants for semi-structured and fully structured interviews. Patient or healthcare consumer involvement: There was no direct patient or healthcare consumer involvement in this study. However, we will update a web-based knowledge translation tool (http://depressionscreening100.com/phq/) to help clinicians considering screening for depression with the PHQ-9 estimate the expected numbers of positive screens and the true and false screening outcomes based on results from this study.