Reliability and validity of the Newcastle Ottawa Scale

2012 Auckland

Hartling L¹, Hamm M¹, Milne A¹, Vandermeer B¹, Ansari M², Tsertsvadze A², Dryden DM¹

¹University of Alberta, Canada

²Ottawa Hospital Research Institute, Canada

Background: The Newcastle Ottawa Scale (NOS) is used to assess methodological quality of cohort and case-control studies in systematic reviews. There has been limited research examining inter-rater reliability and validity of the tool.

Objectives: (1) assess the reliability of the NOS between individual raters; (2) assess the validity of the NOS by examining whether treatment effect estimates vary by study quality.

Methods: Two reviewers independently assessed 131 cohort studies from 8 meta-analyses. Disagreements were resolved through discussion to produce single assessments for each study. Inter-rater agreement was calculated using kappa statistics. For each meta-analysis we calculated a ratio of odds ratio (ROR) for studies assessed as meeting or not meeting each NOS item. The RORs for each meta-analysis were combined to give an overall estimate of differences in effect estimates using meta-analytic techniques.

Results: Inter-rater agreement varied across the different NOS items: representativeness of exposed cohort (fair); selection of non-exposed cohort (poor); ascertainment of exposure (moderate); demonstration that outcome was not present at outset of the study (poor); comparability of study groups (slight); outcome assessment (moderate); length of follow-up (substantial); adequacy of participant follow-up (fair); and, total NOS score (fair). Interviews with the reviewers provided input on challenges using the tool, e.g., difficulty interpreting some terminology (‘selected’ population), unclear distinction between some response options (‘truly’ vs. ‘somewhat’ representative population; ‘structured interview’ vs. ‘written self-report’). Reviewers commented that ‘unclear’ or ‘no description’ options were needed for some items. No associations were found between individual NOS items or overall NOS score and magnitude of effect estimates.

Conclusions: This study provides new and important information about the NOS. The feedback from reviewers in applying the tool and identification of items that are particularly problematic provide valuable information for revisions and more detailed guidance.