|
|
||||||||
Family Medicine, Epidemiology and Community Health, Virginia Commonwealth University, Richmond, Virginia
Address reprint requests to: Steven H. Woolf, MD, MPH, Professor of Family Medicine, Preventive Medicine and Community Health, Virginia Commonwealth University, Fairfax, Virginia 22033. E-mail: swoolf{at}vcu.edu
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
For these reasons, especially the impact on health, it is important to base the content of guidelines on scientific evidence that the proposed dietary practice will improve health. Scientific evidence almost always figures prominently in the development of dietary guidelines, but in some settings policy guidelines are also shaped by other considerations (e.g., economic impact, feasibility) and a desire to accommodate the concerns of private industry and advocacy organizations. The circumstances that cultivate these ties and their effect on the quality of food policy are addressed elsewhere [1]. This article discusses how dietary guidelines are developed on the basis of scientific evidence. It also reviews the characteristics of evidence that are considered in judging its quality and the systems that guideline panels use for rating evidence. The economic and political considerations that enter into dietary guidelines, and the controversies they generate, are beyond the scope of this review.
| DEVELOPING EVIDENCE-BASED GUIDELINES |
|---|
|
|
|---|
The distinguishing features of this approach are its emphasis on an examination of the evidence that is comprehensive, critical, and explicit. Comprehensiveness is important to ensure that all evidence, not just those studies that support a particular viewpoint or that reflect a selection bias, are considered. Critical appraisal is emphasized to examine the strengths and weakness of the study designs so that judgments about the evidence can be linked to quality. Explicitness gives transparency to the evaluation, allowing readers to understand the methods used in the analysis, the strengths of the evidence, where gaps exist, and the rationale for practice recommendations or policies, whether they be evidence- or opinion-based.
Evidence that is available is often of poor quality, either in terms of internal validity (the extent to which the data are reflective of the setting in which the study was conducted) or external validity (the extent to which the findings can be extrapolated to other populations, interventions, or settings). Even the best evidence can provide only "averages" for predicting outcomes in a given individual. A variety of effect modifiers (e.g., risk factors, past medical history, comorbidity, lifestyle) influence where an individual will fall in the bell curve that surrounds the mean.
The evidence-based approach to formulating guidelines does not gloss over these limitations in evidence but insists on making them explicit. It does not insist on evidence from randomized trials but does demand that the grade and quality of the evidence be carefully evaluated and stated clearly. It does not preclude the use of opinion or expert judgment in setting policy but does insist on acknowledging when this is done [7]. It advocates disclosure of gaps in the evidence to help clarify research agendas and calls attention to design features that future studies should incorporate to address deficiencies in current data. It encourages that the determinant factors in practice and policy decisions be evidence-based to maximize equity and effectiveness for all patients.
Steps in Developing Evidence-Based Practice Guidelines
Evidence-based guidelines feature an explicit methodology, include as their foundation a systematic review of the evidence, provide graded recommendations that are linked directly to the supporting evidence, and state explicitly when recommendations are based on opinion. In general, evidence-based guidelines emerge from six steps, which are conducted with varying intensity and in different sequences depending on the topic.
1. Specification of Topic and Methodology.
The first step is to give precision to the focus of the review, specifying the target condition, the interventions (e.g., dietary practice) to be reviewed, relevant populations and contextual circumstances, and outcome measures of significance. The boundaries for the search are also determined, such as bibliographic databases and exclusion criteria (e.g., studies published before a given date, foreign-language articles, editorials, uncontrolled studies, non-human studies). An evidence model often helps to clarify the linkages in the analytic framework for which evidence is sought [8].
2. Systematic Review.
The review of evidence follows procedures that have become standardized in recent years [9]. Three basic steps include: (a) a comprehensive literature search, using explicitly documented search terms and other techniques to assure the reviewers and readers that all relevant evidence has been gathered; (b) critical appraisal of individual studies, using explicit analytic criteria to judge internal and external validity and documenting the findings in abstraction forms and evidence tables; and (c) synthesis of results, summarizing the results in narrative text, evidence tables, or balance sheets. The last step may involve quantitative pooling of data in meta-analyses [10] to estimate overall effect sizes, especially when individual studies lack statistical power, or in decision analytic models that predict outcomes under varied assumptions about determinant variables.
Below we discuss the hierarchy of evidence on which guidelines are based. In many systematic reviews, studies are assigned a "grade," or evidence code, that reflects the position of the study in a hierarchy of evidence quality. A variety of coding schemes exists [11]. In grading studies of the effectiveness of treatments, a common feature is to place randomized trials at the top of the hierarchy, followed by observational and epidemiologic studies. Other coding schemes are appropriate for studies evaluating diagnostic tests, epidemiologic trends, and natural history [12].
Reviewers examine a variety of factors to assess internal validity (e.g., study population, allocation to groups, interventions, outcome measures, attrition rates, statistical measurements) [13]. The generalizability of the study population, intervention, and setting are considered to judge external validity. Over 20 instruments are available to grade the quality of randomized trials [14].
3. Expert Opinion.
Expert opinion plays a role in all practice guidelines. Even when evidence is available, subjective judgments are made in assessing the strength or generalizability of the evidence and in weighing the tradeoffs between benefits and harms. When evidence is lacking, groups differ on the extent to which they are willing to make recommendations based on opinion. A hallmark of evidence-based guidelines is to be explicit when opinion is used so that readers understand the basis for the recommendations and can make their own judgment about validity.
4. Public Policy Considerations.
Guideline developers must often consider the cost-effectiveness or cost-utility of the dietary practices they advocate and the impact on related industries. Other policy considerations, such as feasibility, support systems, insurance policies, and medicolegal implications are considered to varying degrees depending on the topic and panel philosophy. It is in this context that conflicts of interest among panel members, and among the sponsors of the guideline project, become especially problematic [15]. Some groups rigidly avoid making opinion-based recommendations, instead offering the neutral conclusion that there is insufficient evidence to make a recommendation.
5. Drafting of Document.
The wording of recommendations receives great attention in practice guidelines, because even the slightest nuances of language can have serious policy implications [16]. A characteristic of evidence-based guidelines is the use of letter codes (e.g., "A" recommendation) or recommendation categories (e.g., "standards," "guidelines," "options") to reflect how strongly the intervention is recommended. Almost always, this grading scheme reflects the strength of supporting evidence. Current examples of such grading schemes are discussed below.
6. Peer Review.
As with any scholarly document, evidence-based guidelines are typically circulated in draft form to content experts to obtain feedback on the comprehensiveness of the review and the validity of the critical appraisal. The draft is also sent to stakeholders, such as relevant professional societies, advocacy organizations, and industrial representatives, for further feedback.
Limitations of Evidence-Based Practice Guidelines
Evidence-based practice guidelines, like all guidelines, can be flawed, advocating interventions that are not in the best interest of the public [17,18] or have become outdated [19]. Sometimes the errors stem from limitations in the science itself, such as lack of data or poor generalizability. Sometimes errors occur when panel members reach invalid conclusions in translating science into policy. Biases or conflicts of interest among panel members, often influenced by outspoken individuals or pressure from interest groups, can produce different recommendations than the data support [20,21]. Recommendations that do not give guidance on individualization or that reduce complex decisions into simplistic algorithms may be overly rigid and may result in more harm than good.
| WEIGHING THE EVIDENCE |
|---|
|
|
|---|
Quality of Evidence
The quality of the evidence can be assessed at three levels: (a) the quality of an individual study, (b) the quality of a body of evidence (group of studies) regarding a putative effect of a dietary practice (e.g., whether intake affects the incidence of cancer), and (c) the complete body of evidence regarding the appropriateness of the dietary practice [22]. Stating that the evidence for a dietary practice is "good" or "fair" is not informative unless the level of analysis is clear.
Study Design Categories
The quality of individual studiesthe first level of analysisis a function of the study design category (its position in the hierarchy of evidence) and the methods used to conduct the study. Both are important; a well-designed observational study may be more persuasive than a poorly performed randomized trial.
In the conventional hierarchy of evidence (Table 1), uncontrolled epidemiologic data and case series rank lowest in proving effectiveness. One step closer to definitive evidence is provided by controlled observational studies, which compare outcomes among those who were or were not exposed to the intervention. Historical (before-after) studies, such as a comparison of outcomes within a community before and after a change in exposure, raise questions about the influence of temporal factors other than the exposure. Cross-sectional comparisons, such as when outcomes for subjects with one exposure are compared with those of unexposed subjects, also lack persuasiveness because of potential confounding variables: the characteristics of exposed subjects may have an independent effect on observed outcomes that are unrelated to exposure.
|
Prospective cohort studies overcome some of the limitations of retrospective analyses by establishing the variables of interest (e.g., dietary intake patterns) at the start of the study and collecting them systematically over time, often with long periods of follow-up, but the potential influence of confounding remains. Unless exposure to the dietary practice occurs randomly, it is possible that exposed subjects will differ in characteristics other than exposure that may account, at least in part, for observed outcomes.
It is this concern that accounts for the primacy of randomized controlled trials in demonstrating effectiveness [23]. The defining characteristic of such trials is that the assignment of patients to intervention groups is made randomly, creating comparison groups that are essentially the same in all respects other than exposure to the intervention. Unrecognized, as well as known, confounding variables are thereby distributed equally and, at least in theory, should not contribute to observed differences in outcomes.
A number of factors are considered in judging the quality of randomized controlled trials, including: concealment of allocation, procedures that prevent investigators from knowing at the time of recruitment to which group the next subject will be assigned [24]; blinding, preventing observers from knowing the subjects exposure status; attrition, the degree to which enrolled subjects are lost to follow-up; crossover, in which subjects assigned to one group receive improperly the intervention prescribed for another, a frequent cause of contamination of controls, in which exposure occurs among subjects meant to be unexposed; and intention-to-treat analysis, in which the denominator for comparison groups consists of the persons originally assigned to the groups, regardless of whether their exposure status followed protocol. Statistical power is an important concern, especially for studies that report no statistically significant difference in outcomes. It is invalid to infer that an intervention is ineffective if, for example, the trial lacked sufficient sample size or follow-up to demonstrate an effect.
Individual studies may not provide definitive evidence of effectiveness or ineffectiveness, often because their sample sizes are too small or because they conflict with findings from other studies, and it is sometimes helpful to pool the results of studies to gain further insights. Classical meta-analysis uses these pooled data to obtain a point estimate (and confidence interval) of the likely effect of interventions based on the summed data [10]. The validity of such estimates depends on multiple factors, including the degree of heterogeneity of the studies from which the data are drawn and the statistical methods used to pool the results [14,25].
Although on methodologic grounds the results reported in many meta-analyses are invalid and should certainly not be accepted on face value [26], faith in numbers tempts many policymakers to place undue confidence in these pooled point estimates. Often, especially when study designs are dissimilar, a more useful function of meta-analyses is not to produce a pooled effect estimate but to explain potential reasons for dissimilar findings. By grouping studies that reach different conclusions it is often possible to identify patterns in the conduct of studies or analysis of results that help explain potential effect modifiers and determinants of the performance of interventions.
Data from multiple studies are often entered into mathematical models to predict outcomes that have not been or cannot be examined directly in studies [27]. One of the most useful outputs of modeling studies is sensitivity analysis, by which modelers examine the extent to which altering a particular assumption influences projected outcomes. It identifies the areas of uncertainty that matter most in determining true benefit and highlight important areas of uncertainty for future research.
Measuring Magnitude
Results can be statistically significant without having clinical or public health significance. Proponents of a dietary practice, in making their case, may emphasize the relative, rather than the absolute, benefits of interventions. The absolute benefit of a 20% relative reduction in the risk of dying from cancer depends on the baseline probability of death. If that probability is 100/100,000, the intervention reduces the risk of death to 80/100,000, an absolute difference of 20/100,000, or an absolute risk reduction of 0.02%. This is a far less impressive figure than the relative risk reduction of 20%. Although both figures are true, the absolute risk reduction has important policy implications, because it tells us that the number of persons who must receive the intervention to save the life of one individual is 100/0.02. The number-needed-to-treat (NNT) is therefore 5000.
The magnitude of benefit and its persuasiveness depend in large part on the outcome measures. What should matter most are health outcomes, which here refers to outcomes perceptible to people (e.g., angina, death). Because of the lengthy follow-up periods and methodologic challenges associated with measuring such outcomes, however, many studies infer effectiveness by measuring intermediate or surrogate outcomes. Intermediate outcomes are findings that are not health outcomes in themselves (e.g., cellular atypia) but that precede, or are thought to increase the risk of, such outcomes. Surrogate outcomes are indicators that correlate with, but are not themselves, health outcomes (e.g., length of hospital stay). Relying on such indicators to infer effectiveness must be done cautiously, however, because dietary practices can improve intermediate outcomes without necessarily improving health [28,29]. An intervention that reduces the incidence of adenomatous polyps by 20%, for example, will have a small impact on cancer morbidity or mortality if only 1% of such polyps are destined to progress to cancer.
The most definitive health outcome, both in importance to the public and in relative ease of measurement, is death. The customary endpoint is the disease-specific mortality rate, and not all-cause mortality. Some consider the latter the ideal endpoint. If a dietary practice reduces disease-specific and not overall mortality, they argue, it does people little good, presumably trading one cause of death for another. The flaw in this reasoning is that deaths from any single disease account for a relatively small proportion of all deaths in a population. While trials may have sufficient statistical power to show an effect on disease-specific deaths, demonstrating an effect on overall mortality would require an enormous sample size and follow-up periods that are usually well beyond the budgets and recruiting capacities of investigators.
Harms and Costs
For dietary practices (and other health measures addressed in guidelines) it is necessary to consider untoward effects and potential harms to assess the net benefit of the intervention. The intended health benefits of dietary modifications can introduce risks and create complications that offset the net benefit-harm ratio. Guideline panels vary in the thoroughness with which they address harms and the relative importance they assign to the potential downsides of the interventions they recommend. Further, even panels that weigh harms with as much rigor as they examine benefits find that direct evidence about the probability and magnitude of harms is often lacking or of poor quality, making it difficult to reach definitive conclusions about the balance of benefits and harms.
Developers of guidelines on dietary practices must also consider the economic implications to society, the food industry, and others. The simplest measures are direct costs, but the more pivotal question for most health services is their value (the ratio of expenditures to benefits). An intervention with low value, even if relatively affordable in up-front costs, represents a poor use of resources, whereas a highly costly service may be an excellent value if it is highly effective. In cost-benefit analysis, the benefits are measured in monetary units, whereas in cost-effectiveness analysis the benefit is measured in health gains (e.g., years of life saved). In cost-utility analysis, health benefits are adjusted to reflect the relative importance of the outcome to patients (e.g., QALY). The validity of such comparisons, and of economic calculations more generally, is highly dependent on the quality of available cost estimates, which is often poor, and on the sophistication of the analytic methods [30].
| SYSTEMS FOR RATING EVIDENCE AND GRADING RECOMMENDATIONS |
|---|
|
|
|---|
Rating Evidence
Systems for rating the strength of evidence, such as that established by the U.S. Preventive Services Task Force (Table 1) are reflective of the hierarchy of evidence. Some schemes for rating evidence have devoted greater attention to incorporating issues of study quality and to assigning a place in the hierarchy for systematic reviews [32]. A good example is the approach taken by the Oxford Centre for Evidence Based Medicine (Table 2).
|
|
|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. G. Logan Dietary Sodium Intake and Its Relation to Human Health: A Summary of the Evidence J. Am. Coll. Nutr., June 1, 2006; 25(3): 165 - 169. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |