The Grading of Recommendations Assessment, Development and Evaluation Reliability Study (the GRADERS)

2012 Auckland

Mustafa R¹, Santesso N¹, Brozek J¹, Akl E², Schunemann H¹

¹McMaster University, Hamilton, Canada

²The State University of New York, University at Buffalo, NY, USA. McMaster University, Hamilton, Canada

Background: The Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach has been widely adopted for summarizing, grading and presenting evidence by systematic reviewers and guideline developers.

Objectives: 1. Evaluate the inter-rater reliability of assessing quality of evidence (QoE) using the GRADE approach. 2. Evaluate the effect of assessing QoE in duplicate on the reliability of this approach.

Methods: All participants completed two training exercises prior to the study. Participants worked independently as single raters initially to assess the QoE of four outcomes from four systematic reviews. After recording their initial impression of QoE using a global rating on a visual analog scale, raters graded theQoE following the GRADE approach. Subsequently, we randomly paired raters and asked them to submit a consensus QoE rating. Investigators, data abstractors and data analysts were unaware of raters’ identification. We used generalizability theory to calculate a reliability coefficient.

Results: Fifteen volunteers from the GRADE working group and 10 from the Health ResearchMethodology graduate program at McMaster University participated in the study. Members of the GRADE working group had more experience with GRADE approach at baseline. The inter-rater reliability of the GRADE approach in assessing the QoE when single raters evaluated the body of evidence was 0.66 among the HRM students and 0.72 among members of the GRADE working group. The inter-rater reliability of a global rating of QoE using VAS (without using the GRADE approach) was lower −0.31 and 0.27 respectively. The inter-rater reliability did not improve when QoE was assessed by pairs of raters reaching consensus on the final rating.

Discussion: Our findings support the presumption that using GRADE approach by trained individuals is more reliable than intuitive judgments about the QoE. The results also suggest that single raters are sufficient to reliably assess the QoE using GRADE system.