Skip to main content
GRADE guidelines: 7. Rating the quality of evidence - Inconsistency
Journal of Clinical Epidemiology
  • Gordon H. Guyatt, McMaster University
  • Andrew D. Oxman, Norwegian Knowledge Centre for the Health Services
  • Regina Kunz, University Hospital Basel
  • James Woodcock, London School of Hygiene and Tropical Medicine
  • Jan Brozek, McMaster University
  • Mark Helfand, Oregon Health & Science University
  • Pablo Alonso-Coello, Universidad Autonoma de Barcelona
  • Paul P. Glasziou, Bond University
  • Roman Jaeschke, McMaster University
  • Elie A. Akl, State University of New York at Buffalo
  • Susan Norris, Oregon Health & Science University
  • Gunn Vist, Norwegian Knowledge Centre for the Health Services
  • Philipp Dahm, University of Florida
  • Vijay K. Shukla, Canadian Agency for Drugs and Technology in Health
  • Julian Higgins, MRC Biostatistics Unit
  • Yngve Falck-Ytter, Case Western Reserve University
  • Holger J. Schunemann, McMaster University
Date of this Version
Document Type
Journal Article
Publication Details

Citation only

Guyatt, G. H., Oxman, A. D., Kunz, R., Woodcock, J., et al. (2011). GRADE guidelines: 7. Rating the quality of evidence - Inconsistency. Journal of clinical epidemiology, 64 (12), 1294-1302.

Access the journal's website.

2011 HERDC submission. FoR code: 111700

© Copyright Elsevier Inc., 2011. All rights reserved.

This article deals with inconsistency of relative (rather than absolute) treatment effects in binary/dichotomous outcomes. A body of evidence is not rated up in quality if studies yield consistent results, but may be rated down in quality if inconsistent. Criteria for evaluating consistency include similarity of point estimates, extent of overlap of confidence intervals, and statistical criteria including tests of heterogeneity and I2. To explore heterogeneity, systematic review authors should generate and test a small number of a priori hypotheses related to patients, interventions, outcomes, and methodology. When inconsistency is large and unexplained, rating down quality for inconsistency is appropriate, particularly if some studies suggest substantial benefit, and others no effect or harm (rather than only large vs. small effects). Apparent subgroup effects may be spurious. Credibility is increased if subgroup effects are based on a small number of a priori hypotheses with a specified direction; subgroup comparisons come from within rather than between studies; tests of interaction generate low P-values; and have a biological rationale.
Citation Information
Gordon H. Guyatt, Andrew D. Oxman, Regina Kunz, James Woodcock, et al.. "GRADE guidelines: 7. Rating the quality of evidence - Inconsistency" Journal of Clinical Epidemiology Vol. 64 Iss. 12 (2011) p. 1294 - 1302 ISSN: 0895-4356
Available at: