Can we use results from a broader study population for conclusions about the treatment effect in a subpopulation of interest?

Article type
Authors
Sieben W1, Beckmann L1, Grouven U1, Kieser M2, Skipka G1, Bender R1
1Institute for Quality and Efficiency in Health Care
2Institute of Medical Biometry and Informatics, Ruprecht-Karls-Universität, Heidelberg
Abstract
Background:
When assessing the benefit of an intervention in systematic reviews, suitable studies may not exist, especially when the population of interest is narrowly defined. In such cases, a study considering a broader population may be available. For answering the specific research question of the systematic review, only a subpopulation of this broader study population is relevant. When this target population (TP) is small, this can cause a lack of power and the results may be inconclusive. Given a statistical significant effect in the complete study population (SP) but none in the TP, this leads to the question whether the effect seen for all participants might also be present for the TP.

Objectives:
To determine which circumstances justify the use of results from a broader study population for conclusions about the effect of an intervention in the subpopulation of interest.

Methods:
We assessed three multistep statistical test procedures. The first procedure consists of increasing the level of significance (α) for the test in the TP as one of several steps (elevation rule). The two others are permutation based and involve variations of a test for heterogeneity as used in meta-analyses.
The SP consists of the relevant TP and the non-target population (nTP). The main idea is to test whether the treatment effect in the TP differs from the treatment effect in the nTP. In a simulation study, we compared the empirical type 1 error and power for all three test procedures. We varied a set of parameters, such as sample size in nTP, the relation between TP and nTp as well as the effect sizes (standardized mean differences) in TP and nTP.

Results:
By definition, all procedures result in an increased empirical type 1 error. We identified the elevation rule as having the best improvement in empirical power, while increasing the empirical type 1 error only slightly to a median of 5.9%. It gains power of up to 8.7% points, depending on the parameter constellation, compared to testing in the TP alone at α = 0.05.

Conclusions:
Statistical inference in small subpopulations can suffer substantially from a lack of power. We evaluated different test procedures to address this problem and proposed the elevation rule that uses information from a broader population and gains power with a just slightly increased type 1 error.

Patient or healthcare consumer involvement:
None.