Empirical evaluation of ranking metrics in network meta-analysis

2020 Abstracts

Chiocchia V¹, Nikolakopoulou A¹, Papakonstantinou T¹, Egger M¹, Salanti G¹

¹Institute of Social and Preventive Medicine, University of Bern

Background:
Network meta-analysis (NMA) can produce ranking metrics that lead to a hierarchy of medical interventions from the most to the least preferable. Existing ranking metrics can be non-probabilistic, such as the estimated relative treatment effect, or probabilistic, where probabilities are derived using the distribution of the relative treatment effects. Probabilistic ranking metrics include the probability of each treatment ranking first, second, third, etc., the mean rank, the median rank, and the surface under cumulative ranking curve (SUCRA) or its frequentist equivalent, the P-score. A specific definition of the best treatment leads to a distinctive treatment hierarchy problem and can be addressed with a different ranking metric.
Objectives:
To empirically evaluates the level of agreement between treatment hierarches produced by different ranking metrics.
Methods:
We re-analysed 232 networks of four or more interventions from randomised controlled trials, published between 1999 and 2015. We produced treatment hierarchies using the following ranking metrics: the probability of producing the best value (pBV), the surface under the cumulative ranking curve (SUCRA) from both a frequentist and a Bayesian framework, and the relative treatment effects using an alternative parametrisation of the network meta-analysis model that estimates relative treatment effects against a fictional treatment of average performance.
To estimate the level of agreement between treatment hierarchies we used Spearman’s ρ, Kendall’s τ correlation, and the Yilmaz τ and Average Overlap to give more weight to agreement on higher ranks. Finally, we assessed how the amount of the information present in a network affects the agreement between treatment hierarchies, using the average variance, the relative range of variance, and the total sample size over the number of interventions of a network.
Results:
Overall, the pairwise agreement was high for all treatment hierarchies obtained by the different ranking metrics (Table 1). The highest agreement was observed between SUCRA and the relative treatment effect for both correlation and top-weighted measures whose medians were all equal to one. The agreement between rankings decreased for networks with less precise estimates and the hierarchies obtained from pBV appeared to be the most sensitive to large differences in the variance estimates. However, such large differences were rare in practice.
Conclusions:
Different ranking metrics address different treatment hierarchy problems, but they produced similar rankings in the published networks. Therefore, researchers reporting NMA results can use the ranking metric they prefer, unless there are imprecise estimates or large imbalances in the variance estimates. In this case treatment hierarchies based on both probabilistic and non-probabilistic ranking metrics should be presented.
This project is funded by the Swiss National Science Foundation under grant agreement No.179158.
Patient or healthcare consumer involvement: Not relevant.