Assessing trial quality by a short and a detailed list

1995 Oslo

Zaat JOM, Assendelft WJJ, Mank TG

Introduction: Ranking trials according to their quality is a topic of discussion in literature and CRGs. So far only concealment of allocation has proven to be of influence on the results. Some plead for simple lists with attention only to concealment and randomization, while others use more extensive lists. In former meta-analyses, our department used extensive lists with topics such as description of the population, randomization, baseline comparability of study groups, dropouts, loss to follow-up. Scoring is time consuming and reviewers, especially relatively inexperienced, find some topics difficult to score.

Objective: To investigate differences in ranking the quality of trials using short or detailed lists.

Methods: In a meta-analysis on the treatment of Giardia lamblia we used the scoring system of the Parasitic Diseases Group as a short list (adequacy of concealment, allocation, blinding outcome and follow-up, range 4-12) and a self-designed detailed list (range 1-100). Twenty-one trials were scored by two reviewers, one experienced reviewer trained in scoring the quality of trials but inexperienced in parasitology and one inexperienced reviewer, however, with experience in parasitology. Trials were divided in four groups according to quartiles of the scoring lists for the short and long list separately, and for each reviewer apart.

Results: The experienced reviewer made 12 shifts: 5 trials scored lower on the detailed list (one 3 quartiles, one 2); 7 trials made a shift upwards on the detailed list (3 trials 2 quartiles, 4 one quartile). The inexperienced reviewer made 13 shifts: 5 trials scored lower on the detailed list (one trial 3 quartiles); 8 trials scored better on the detailed list (one 3 quartiles, three 2 quartiles). Even when using only methodological topics from the extensive list, there was a considerable difference in the ranking of trials between both lists. Most of the differences between the reviewers in the short list were based on one systematic reading error of the junior.

Discussion: In this simple experiment, an extensive list for assessing trials provided only more confusion, because of difficulties in interpretation in the descriptive parts of the trial. Even the experienced reviewer scored differently on both lists. More research on the validity and applicability of several checklists is needed. Previous training of the reviewers should be one of the determinants to study.