The Fragility of Statistically Significant Findings from Depression Randomized Controlled Trials

2023 London

Luo M¹, Li Y², Wang Y², Huang J², Liu Z¹, Gao Y¹, Chai Q¹, Liu J¹, Fei Y¹

¹Centre for Evidence-Based Chinese Medicine, Beijing University of Chinese Medicine

²Beijing University of Chinese Medicine

Background:

Efficacy of an intervention is commonly evaluated using the P-value; however, recent literature has drawn attention to the potential inadequacy about robustness of threshold P-value as a tool for reporting discontinuous outcomes in clinical trials. The fragility index (FI), which is the minimum number of changes from events to non-events resulting in loss of statistical significance, has been suggested as a means to aid the interpretation of trial results.

Objectives:

In this systematic survey, we calculated the FI of clinical trials in depression, which report positive eligible outcomes.

Methods:

This is a retrospective analysis of randomized controlled trials in depression published from 2012-2022 in The New England Journal of Medicine (NEJM), The Lancet, The Journal of the American Medical Association (JAMA), The British Medical Journal (BMJ), and 35 top journals listed in Psychiatry-Social Sciences Citation Index (SSCI) category in the field of psychiatric medicine focusing primarily on depression. Two-arm studies with 1:1 randomization and significant positive results for discontinuous outcomes were eligible for the fragility index calculation, which involves the iterative reduction of an event to the experimental group (defined as the group with the larger number of events in positive trials) and concomitant subtraction of a non-event from that group, until positive significance (defined as p˂0·05 by Fisher’s exact test) is lost.

Results:

We identified 1120 trials, and, whereas a total of 130 randomized controlled trials were included, 33 trials were fulfilled with two eligible outcomes (remission rate and response rate). The median FI of total trials included was four (25th-75th percentile, 2-8; range, 1-40), and greater than 33.85% of trials had a FI of equal or less to two. A total of 68.46 % of trials reported the loss to follow-up greater than their FI. Trial sample size, the total number of events, the impact factor of journals which included trials published, and the ratio of the sample size of enrolled to the sample size of screened were associated with FI. In trials with two eligible outcomes, the distributions of FI were different but positively correlated.

Conclusions:

In depression trials reporting positive discontinuous outcomes, the findings often hinge on small numbers of events. Clinicians should be wary of basing decisions on trials with a low FI.

Patient, public, and/or healthcare consumer involvement: 24,345 patients.