AMSTAR-2’s inter-rater reliability for quality assessment in an overview of interventions to prevent adverse events in the ICU

Article type
Authors
Pantoja P1, Suclupe S2, Parellada D3, Muñoz J3, CarreraNurse M4, Uya J5, Amparo N6, Parise J7, Salas-Gama K1, Requeijo C8, Merchan-Galvis A9, Torres M2, Robleda G10, Simancas D7, Vicuna J8, Salvador J2, Díaz Y11, Barajas L12, Martínez-Zapata M13
1Vall d’Hebron University Hospital, Barcelona, Spain; Autonomous University of Barcelona, Barcelona, Spain
2Vall d’Hebron University Hospital, Barcelona, Spain
3Autonomous University of Barcelona, Barcelona, Spain
4Autonomous University of Barcelona, Barcelona, Spain; Unidad de Paciente Crítico en Red de Salud UC Christus, Santiago de Chile, Chile
5Hospital Universitario de Bellvitge, Instituto Català de Salut. Nursing Research Group, Bellvitge Institute for Biomedical Research, Barcelona, Spain
6Public Health Unit, ACES Alentejo Central, Alentejo, Portugal
7Public Health and Clinical Epidemiology Research Center (CISPEC). Ecuador’s Associated Cochrane Center, Iberoamerican network. Faculty of Health Sciences Eugenio Espejo, UTE University, Quito, Ecuador
8Clinical Epidemiology Service HSCSP, IR Sant Pau, Barcelona, Spain
9Department of Social Medicine and Family Health, Universidad del Cauca, Popayán, Colombia
10Nursing School of Barcelona, Campus Docent Sant Joan de Déu-Private Foundation, University of Barcelona, Barcelona, Spain
11Epidemiology Resident, Heath Programs Department, National Institute of Respiratory Diseases Dr. Emilio Coni, ANLIS Malbrán – Ministry of Health of Argentina, Buenos Aires, Argentina
12Research Unit in Evidence-based Medicine, Cochrane Center, Federico Gómez Children Hospital, Mexico, Mexico
13Cochrane Iberoamerica– Clinical Epidemiology Service HSCSP, IR Sant Pau, CIBERESP, Barcelona, Spain
Abstract
"Background
The MeaSurement Tool to Assess Systematic Reviews (AMSTAR-2) is critical for assessing reviews including randomised or non-randomised studies of healthcare interventions, or both (1). With 16 items for evaluation (seven critical and nine noncritical), the discordances between reviewers are something to address with discussion and consensus, or a third reviewer, and demands time and effort. To identify the items with more disagreement can be useful to improve the tool.
Objectives
To evaluate the inter-rater reliability and the weighted kappa statistics of AMSTAR-2. To identify the items with less agreement.
Methods
We assessed the methodological quality with the AMSTAR-2 tool in the overview of systematic reviews about interventions to prevent adverse events in the intensive care unit. The study team was divided to evaluate 139 systematic reviews in pairs (2,3). We measured inter-rater and calculated Kappa weighted score for agreement between pairs and, by AMSTAR-2 items.
Results
In a preliminary analysis, agreement between reviewers was significantly high (94.1%) with a good strength of agreement (kw=.727, p<.001), been these results consistent with critical and noncritical items (93.2%, kw=.742, p<.001; and 89.9%, kw=.682, p<001, respectively). Critical items with the least agreement were those referring to the risk of bias and the assessment of heterogeneity in non-randomized studies (9.2 and 11.2), respectively. The non-critical items with the least agreement were the explanation of the justification for including the type of study design in systematic reviews and the detailed description of the studies (items 3 and 8).
Conclusions
Our results are in line with the AMSTAR-2 validation study (1).
The levels of agreement between the pairs of ratters varied across items, but they were moderate to substantial for most items. Differences between ratters reflect the demanding nature of some item-level judgments and should prompt group discussion of their causes and importance, and, if needed, consultation with experts in subject matter and methods. Prior training of the reviewers in the AMSTAR-2 instrument is necessary so that there is maximum consensus when applying it individually.
Statement on the relevance and importance to patients: This work will improve overview methods. Therefore, evidence production will be more robust.
"