Latent class models for individual participant data meta-analyses of diagnostic test accuracy studies with imperfect reference standards

Article type
Authors
Negeri ZF1, Levis B2, Wu Y3, Thombs BD2, Benedetti A1
1Department of Epidemiology, Biostatistics and Occupational Health, McGill University
2Lady Davis Institute for Medical Research, Jewish General Hospital
3Department of Psychiatry, McGill University
Abstract
Background: Depression accounts for more years of “healthy” life lost than any other medical condition. Typically, questionnaire-based screening tools and clinically-administered diagnostic interviews are used to screen for and diagnose major depressive disorders. However, neither the screening tools nor the diagnostic interviews accurately screen or diagnose depressive symptoms because of the imperfect nature of the diagnostic interviews used as reference standards. Systematic reviews and meta-analyses results based on such imperfect reference standards may lead to misleading conclusions that misinform both clinicians and other decision-makers. Latent class models have been commonly applied to correct for imperfect reference or gold standards in conventional or aggregate data diagnostic test accuracy studies. Most of these latent class models used a Bayesian analysis approach to estimate unknown model parameters. To the best of our knowledge, there have been no methodological studies that attempted to account for imperfect reference standards in the context of individual participant data meta-analyses (IPDMA) of diagnostic test accuracy studies.

Objectives: Therefore, the objective of this study is to propose and validate latent class models for IPDMA to estimate the actual diagnostic test accuracy of both screening tools and imperfect reference standards for depression screening and diagnosis.

Methods: We will develop and evaluate latent class analysis-based models by exploring both Frequentist and Bayesian approaches to the problem of imperfect reference standards in IPDMA of diagnostic test accuracy data. We will illustrate the models using our database that consists of more than 100 studies and 46,000 participants on the most commonly used tool for detecting major depression in primary care – the Patient Health Questionnaire-9 (PHQ-9). In this database, the PHQ-9 is compared to diagnostic interviews such as the Structured Clinical Interview for DSM (SCID), Composite International Diagnostic Interview (CIDI), and the Mini International Neuropsychiatric Interview (MINI).

Anticipated Results: We expect that our models will generate more realistic test characteristics of depression screening tools and depression diagnosing clinical interviews by correcting for biases in results due to the imperfect nature of reference standards being used – thereby better-informing stakeholders about the correct diagnostic accuracy of the depression screening tools and diagnosing interviews.

Conclusions: Our proposed methods will have implications beyond IPDMA of depression screening tools and diagnostic interviews.

Patient or healthcare consumer involvement: There was no direct patient or healthcare consumer involvement in this study. Nevertheless, the outcome of this study will be a welcome addition to the body of knowledge and clinicians and policy-makers concerned with the accuracy of depression screening tools and interviews.