What Works vs. What We Can Evaluate

By 06/19/2018

Print | NO PDF |

Policy researchers spend a lot of time talking about how little the research we spend our time generating gets applied in practice. While the statutory push in ESSA (Every Student Succeeds Act) for schools to use evidence may seem an elegant solution to this dilemma, it also poses a real risk that school leaders will feel pressure to choose approaches that have been easier to evaluate, rather than those that are the most central to improving educational practice.

ESSA’s evidence requirements and the demand for approved lists

ESSA lays out four tiers of evidence for “activities, strategies, or interventions.” [1] The first three tiers are defined by study methodology: (1) strong, with an experimental design, (2) moderate, with a quasi-experimental design, and (3) promising, with a correlational design and use of control variables. The so-called “fourth tier” of evidence does not require a specific study supporting the practice, but rather that it “demonstrates a rationale” based on research or evaluation and that it will be subject to “ongoing efforts to examine the effects.”

School improvement funds authorized under ESSA section 1003 may be used only for practices supported by the first three tiers of evidence, whereas the vast majority of ESSA funds may either be used on activities that meet ESSA’s fourth tier of evidence, or are not subject to any evidence standard. [2] Nonetheless, local administrators looking for evidence-based solutions may be drawn—or urged by their states—to choose from lists of “proven” interventions, as defined only by the first three tiers of evidence set forth in the law.

The Institute of Education Sciences’ What Works Clearinghouse (WWC) is, among other things, the grandfather of this sort of list. It evaluates—and, importantly—synthesizes findings from evaluations. The user can search for results on a particular intervention (both whether it has been evaluated rigorously, and if, so, how well it worked), a specific study (again, rigor and findings), or area of practice, for summaries of research-supported best practice.

More recently, a team led by Robert Slavin has established, with support of the Annie E. Casey Foundation, the Evidence for ESSA website. Slavin views it as a complement to WWC, giving users access to a curated collection of interventions that (largely) correspond to what ESSA permits under its first three tiers of evidence. [3] Following ESSA’s language, Evidence for ESSA rates interventions based on at least one study establishing effectiveness—no matter how many other studies show null effects.

Individual state education agencies may also choose to make their own lists. In past practice, some states have provided districts with lists of permissible textbooks to choose from, absent any formal evidence base, long before evidence became a policy buzzword. In this light, informing such lists with evidence seems a step forward.

The nature of evaluating programs that meet certain evidence criteria leads to what may be the greatest limitation of these lists—programs that are easy to evaluate will be overrepresented, while more effective approaches that are difficult to evaluate will be scarce.

What makes a program easier to evaluate

Consider what it takes to implement the methodological gold standard of a randomized control trial (RCT). Researchers need sufficient sample size at the unit of random assignment—whether at the school, classroom, or student level—to generate statistical power, and the cost of the evaluation often includes the cost of the intervention itself.

There are two basic ways to keep costs down: lower costs per unit treated (e.g., “lighter touch” interventions) and fewer units treated (e.g., only “struggling” students receive the intervention). Those interventions that are less disruptive to existing core instructional practice may be less controversial, decreasing the political cost of participation in the trial.

In other words, a sufficiently powered evaluation of a substantial change to core instruction will be expensive. Consequently, there are many interventions districts might consider for which there is no “top tier” evidence—in some cases, not because they have been shown to be ineffective, but because they simply haven’t been evaluated with that degree of methodological rigor.

The downsides to lists

It would be impossible for any clearinghouse to provide a comprehensive listing of strategies or products that would meet ESSA’s definition for the “fourth tier” of evidence. However, there is a real danger that state and local policymakers and practitioners will interpret these lists as comprehensive, prohibiting spending on anything that doesn’t show up on them, and preventing districts from choosing the best approaches for their local context.

The U.S. Department of Education’s guidance attempts to reduce the possibility of these bad choices, emphasizing the need for a nuanced view and use of evidence. [4] Advocates who wish to promote the use of evidence have drawn attention to these finer points as well. [5] Nonetheless, districts—especially if they are lacking in local or state research capacity to examine ongoing efforts under ESSA’s fourth tier of evidence—may find the lists reassuring for compliance purposes.

On a case by case basis, the risk is that reliance on lists could result in ill-informed and suboptimal program selection. This could have significant negative consequences for students. But there may be less obvious and potentially more disturbing and far-reaching impacts as well.

First, if districts see these lists contradicting their own experiences, either by recommending products that haven’t worked for them or by excluding valued approaches, they may grow to distrust the entire concept of research-informed practice. This is a particularly bleak outcome to contemplate as researchers are working to help practitioners generate their own evidence.

Another danger of “list-driven” instructional practice is that it could lead districts to focus more on interventions targeted at students who are behind, rather than those altering the core approach adopted for all students. If it is easier to evaluate programs that support a subset of struggling students after core instruction has failed them, districts may invest more in band-aids rather than addressing the root problems in their core programs.

I reviewed the elementary reading programs rated “strong” (corresponding to ESSA’s first level of evidence) and “promising” (ESSA’s third level) in Evidence for ESSA, excluding those interventions that did not explicitly change reading instructional practice but used reading achievement as an outcome measure. Among the programs with the strong evidence rating, about half were delivered to the entire class, and half only to students already falling behind. In contrast, among the programs with the “promising” evidence rating, all were used with all students in a class.

For elementary math, this pattern was similar. Of the 11 programs Evidence for ESSA classified as having “strong” evidence, six were designed for struggling students and five for the whole class. Of those five whole class interventions, one was a socioemotional program with no math content. Meanwhile, the website lists the seven “promising” programs for elementary math, all of which designed are for the whole class.

This doesn’t mean we should ignore the quality of evaluations, but rather that we should continue to generate new evidence.

How to embrace the nuance of evidence

Rigorous reviews of academic evidence can serve a valuable role in advancing practice. States and districts should take advantage of the growing body of resources to help interpret research, but keeping these tips in mind:

• Don’t skimp on the needs assessment and root cause analysis. [6] If it points to problems with core instruction, address them.

• Turn to the WWC practice guides for research-informed practice recommendations that don’t require you to buy stuff from vendors.

• Review the full body of research for a given intervention, not just the study(ies) that meet ESSA’s specific bar for evidence.

• Partner with entities that know how to evaluate education approaches, such as academic institutions or high-capacity research shops in state education agencies. These efforts help local districts generate evidence on approaches tailored to their specific needs, to take advantage what Conaway describes as ESSA’s hidden gem—its fourth tier of evidence. [7] And even when interventions meet one of ESSA’s top three tiers of evidence, it is still useful to study their efficacy in a local context to inform ongoing practice.

— Nora Gordon

ednext-evidencespeaks-smallNora Gordon is Associate Professor of Public Policy at Georgetown University.

This post originally appeared as part of Evidence Speaks, a weekly series of reports and notes by a standing panel of researchers under the editorship of Russ Whitehurst.

The author(s) were not paid by any entity outside of Brookings to write this particular article and did not receive financial support from or serve in a leadership position with any entity whose political or financial interests could be affected by this article.


1. See ESSA, Section 8101(21)(A) for the law’s general definition of “evidence-based”.

2. See ESSA Section 8101(21)(B).

3. https://www.huffingtonpost.com/entry/evidence-for-essa-and-the-what-works-clearinghouse_us_589c7643e4b02bbb1816c369

4. https://www2.ed.gov/policy/elsec/leg/essa/guidanceuseseinvestment.pdf

5. http://results4america.org/wp-content/uploads/2017/05/R4A_LP_REV_May-2017.pdf

6. https://www2.ed.gov/policy/elsec/leg/essa/guidanceuseseinvestment.pdf

7. http://www.kappanonline.org/conaway-tier-4-evidence-essas-hidden-gem/

Sponsored Results
Sponsored by

Harvard Kennedy School Program on Educational Policy and Governance

Sponsored by