Article type
Year
Abstract
Background: The Cochrane Collaboration is adopting the GRADE approach to rating the quality of evidence across studies for each important outcome. The approach includes the possibility of downgrading the quality of the evidence as a result of five considerations: study limitations (risk of bias); indirectness; inconsistency; reporting bias; and imprecision. Although the delineation of these five considerations is in itself extremely helpful, GRADE users require detailed guidance for assessment of each of these considerations. This abstract focuses on the issue of imprecision.
Objectives: To develop detailed guidance for when to downgrade a body of evidence because of imprecision.
Methods: In general, members of the GRADE working group generate initial ideas; extensively discuss these ideas at one or more GRADE meetings, consult individuals outside of the group; and test approaches, including testing at workshops and in the process of guideline development. So far as possible the general approach and specific guidance are based on empirical methodological studies. The suggestions below are preliminary, and have been discussed at a single GRADE meeting thus far.
Results: GRADE defines the quality of evidence for a systematic review as a reflection of our confidence that an estimate of the effect is accurate. For any binary outcome, we suggest downgrading if the pooled estimate fails to meet one of two alternative criteria. The first criterion for adequate precision is that the total sample size is as great or greater than the calculated optimal information size (OIS) (Pogue and Yusuf. Controlled Clinical Trials 1997;18(6):580-593). The OIS represents the number of patients generated by a conventional sample size calculation specifying a particular alpha and beta error, relative risk reduction, and baseline event rate. The second, simpler criterion is that the total number of events is greater than 300. A reviewer can choose between these criteria. If the confidence interval around the point estimate excludes no effect and one of the two criteria above are met then one concludes that precision is adequate and one does not downgrade the quality of the evidence for imprecision. There is, however, a circumstance when the chosen criterion is met and a reviewer will still downgrade for precision. When the 95% confidence interval (or alternative estimate of precision) around the pooled or best estimate of effect includes no effect, reviewers will downgrade for imprecision if that confidence interval still includes appreciable benefit or appreciable harm. Our suggested threshold for appreciable benefit and harm that warrants downgrading is a relative risk reduction or relative risk increase of greater than 25%. For continuous variables we suggest not downgrading if the confidence interval excludes no difference (benefit or harm demonstrated). If the outcome is a continuous variable and the confidence interval includes no effect, we suggest downgrading if the upper or lower boundaries of the confidence interval crosses the minimal important difference (MID), either for benefit of harm. If the MID is unknown or use of different outcomes measures required calculation of an effect size, we suggest downgrading if the upper or lower boundaries of the confidence interval crosses an effect size of 0.5 in either direction.
Conclusions: Developing criteria for downgrading quality of evidence for imprecision is complex and inevitably involves some degree of arbitrariness. Nevertheless, explicit criteria are an absolute requirement for achieving consistency across reviews. The criteria suggested here may be refined on the basis of feedback and further discussion.
Objectives: To develop detailed guidance for when to downgrade a body of evidence because of imprecision.
Methods: In general, members of the GRADE working group generate initial ideas; extensively discuss these ideas at one or more GRADE meetings, consult individuals outside of the group; and test approaches, including testing at workshops and in the process of guideline development. So far as possible the general approach and specific guidance are based on empirical methodological studies. The suggestions below are preliminary, and have been discussed at a single GRADE meeting thus far.
Results: GRADE defines the quality of evidence for a systematic review as a reflection of our confidence that an estimate of the effect is accurate. For any binary outcome, we suggest downgrading if the pooled estimate fails to meet one of two alternative criteria. The first criterion for adequate precision is that the total sample size is as great or greater than the calculated optimal information size (OIS) (Pogue and Yusuf. Controlled Clinical Trials 1997;18(6):580-593). The OIS represents the number of patients generated by a conventional sample size calculation specifying a particular alpha and beta error, relative risk reduction, and baseline event rate. The second, simpler criterion is that the total number of events is greater than 300. A reviewer can choose between these criteria. If the confidence interval around the point estimate excludes no effect and one of the two criteria above are met then one concludes that precision is adequate and one does not downgrade the quality of the evidence for imprecision. There is, however, a circumstance when the chosen criterion is met and a reviewer will still downgrade for precision. When the 95% confidence interval (or alternative estimate of precision) around the pooled or best estimate of effect includes no effect, reviewers will downgrade for imprecision if that confidence interval still includes appreciable benefit or appreciable harm. Our suggested threshold for appreciable benefit and harm that warrants downgrading is a relative risk reduction or relative risk increase of greater than 25%. For continuous variables we suggest not downgrading if the confidence interval excludes no difference (benefit or harm demonstrated). If the outcome is a continuous variable and the confidence interval includes no effect, we suggest downgrading if the upper or lower boundaries of the confidence interval crosses the minimal important difference (MID), either for benefit of harm. If the MID is unknown or use of different outcomes measures required calculation of an effect size, we suggest downgrading if the upper or lower boundaries of the confidence interval crosses an effect size of 0.5 in either direction.
Conclusions: Developing criteria for downgrading quality of evidence for imprecision is complex and inevitably involves some degree of arbitrariness. Nevertheless, explicit criteria are an absolute requirement for achieving consistency across reviews. The criteria suggested here may be refined on the basis of feedback and further discussion.