Grades of Recommendation from Levels of Evidence: Are You Confused Yet?

2002 Stavanger

Falck-Ytter Y, Lelgemann M, Booker D, Antes G

Background: When original research is cited, levels of evidence are increasingly applied, especially in clinical practice guidelines, health technology assessment reports or other reviews. The intention is to deliver the message, that not all research study designs are created equally. Levels of evidence are often reported numerically, while an alphabetical scheme is used to formulate the strength of the ensuing recommendations. Increasingly, the use of levels of evidence is made mandatory by journals or other official bodies. Government agencies may welcome such use to increase quality of health information. For example, the French Ministry of Health recommends levels of evidence for all health-related Internet sites. However, no set international standard exists and variations of implementation are increasingly observed.

Objective: To identify various schemes of levels of evidence used and to report problems related to a non-standardised use of levels of evidence in the literature, especially in the field of clinical practice guidelines.

Methods: A pilot study was performed to identify the use of levels of evidence in journals, scientific or speciality organisations, universities, government health organisations, and by authors of literature on evidence-based medicine.

Results: At least 5 major schemes of defining levels of evidence and many minor variations were encountered. Levels of evidence ranged from 3 to 5 major levels, some with multiple sub-levels, using roman numerals or a more general approach such as "high", "medium", "general." Major variations were seen in the way study designs were applied to the levels of evidence. Grades of recommendation were usually stated with letters ranging from "A" to "C", "D", or "E." However, since some grading included an "against" recommendation (e.g., a "D"), different scales could have opposite meanings, such as "fair evidence against the intervention" versus "expert opinion favouring the intervention." Thus, no standard use of lettered recommendations was seen. Very often, levels of evidence were used without adequately stating the source, thus implying an existing international standard. In some instances levels of evidence were confused with, or grouped together with, levels of importance (e.g., "in symptomatic cholelithiasis, cholecystectomey should be performed [Grade A]. The "A"-grade recommendation was given because of its importance, even though no good controlled trial exists"). It is particularly troublesome that such errors are then perpetuated into secondary documents, such as health web sites or consumer summaries, as seen in this example.

Conclusions: Levels of evidence are used in many different and sometimes erroneous ways. A formal study is therefore needed to examine the relationship between the definitions of levels of evidence and the formulated grade of recommendation. Sole reliance upon grading of recommendations may be problematic and therefore the scheme used should always be carefully checked. It appears inevitable that some form of standardisation, quite similar to the CONSORT statement (Consolidated Standards of Reporting Trials), will be necessary in the future to facilitate a common language of recommendations.