Combined p-values of baseline variables of randomized controlled trials published in 2022 indicate non-randomness beyond chance

2023 London

Klang R¹, Bodnar O², Olsson L¹

¹Camtö (Centre for Assessment of Medical Technology in Örebro)

²Örebro University School of Business

Background: Randomized controlled trials (RCT) are crucial for the evaluation of interventions. This, however, requires that the randomization is carried out correctly. The anaesthetist Carlisle has developed a method to test whether the baseline variables of an RCT could reasonably originate from a true randomization, assuming the p-values are uniformly distributed. In a study from 2017, based on 5,087 RCTs from 8 medical journals, 5.6% more RCTs than expected had a combined p-value > 0.95 or p-value < 0.05 [1].

Objectives: Apply Carlisle’s method to a sample of recent RCTs and compare the findings to Carlisle’s results.

Methods: A sample of 1,075 RCTs, published February 2022, indexed with the MeSH term ‘Randomized Controlled Trial’ in MEDLINE, were checked for eligibility. The inclusion criteria were primary/secondary analyses of RCTs providing number of participants, mean, and standard deviation or standard error, of baseline variables. Carlisle’s method adopts Monte Carlo simulation, ANOVA, and t-test to get p-values of baseline variables, and Stouffer’s method combines them for comparison to a uniform distribution, using R software. A smaller combined p-value indicate that the groups are similar; larger indicate that they are dissimilar.

Results: 566 RCTs were included and 13,085 means of 5,780 (range 1-100) baseline variables were extracted. The proportion of p-values within p-value > 0.95 or p-value < 0.05, p-value < 0.01 or p-value < 0.00001 was 22.8%, 4.8% and 0.05% respectively, i.e., 2, 5, and 500 times larger than would be expected by chance (Table 1). Possible non-randomness was more common in this sample compared to Carlisle’s with the arbitrary limit of 0.95 < p-value < 0.05 but was less common for the extreme limit p-value < 0.00001. The distribution of the combined p-values is presented in Figure 1.

Conclusions: The preliminary findings of this sample of recent RCTs indicate that a larger proportion are associated with non-randomness than expected by chance. The findings are not completely in accordance with Carlisle’s results. Further analyses will be conducted, more baseline variables will be added, and subgroups, such as type of intervention, will be compared. Nevertheless, Carlisle’s method seems to be a promising statistical tool for systematic reviews, and the evaluation of RCTs.