When Fancy New Teacher-Evaluation Systems Don't Make a Difference

In a not-so-shocking turn of events, it turns out that all the energy devoted to building high-profile teacher evaluation systems has had remarkably little impact on teacher evaluation results. This is a problem. After all, the idea that we should do a better job of evaluating teachers is a no-brainer. There’s universal agreement that we need to help struggling teachers improve and find better ways to recognize terrific teachers. So things should’ve gone swimmingly, right?

Not so fast. In a new paper, Matt Kraft and Allison Gilmour look at teacher evaluation results in 19 states that have adopted new evaluation systems since 2009. These new systems were occasioned because reformers and policymakers were horrified by the fact that more than 99% of teachers were routinely deemed effective, even in struggling systems. The push for new teacher evaluation systems was central to Obama administration efforts, along with the Race to the Top and then ESEA waivers. As U.S. Secretary of Education Arne Duncan put it in 2013, “For too long, in too many places, schools systems have hurt students by treating every teacher the same—failing to identify those who need support and those whose work deserves particular recognition.”

Unfortunately, all that time, money, and passion haven’t delivered much. Kraft and Gilmour note that, after all is said and done, the share of teachers identified as effective in those 19 states inched down from more than 99% to a little over 97% in 2015. (This was the case even though teachers themselves, when surveyed by Education Next, suggest that about five percent of their district peers deserve an “F”—and another eight percent deserve a “D.”)

What’s going on? Kraft and Gilmour note that principals acknowledge inflating grades to avoid documentation headaches, discouraging teachers, or being mean. But let’s focus on the forest rather than the trees. As was observed in 2013, efforts to reform teacher evaluation have exhibited a worrisome faith in prescriptive policy:

These efforts have paid short shrift to the simple and frustrating fact that, while public policy can make people do things, it cannot make people do those things well . . . First, state and federal policymakers do not run schools; they merely write laws and regulations telling school districts what principals and teachers ought to do. And second, schooling is a complex, highly personal endeavor, which means that what happens at the individual level—the level of the teacher and the student—is the most crucial factor in separating failure from success. In education, there is often a vast distance between policy and practice. But reformers have greeted with a surprising lack of interest the seemingly self-evident fact that the fruits of policy innovation depend as much on how policies are carried out as on whether they’re carried out. Advocates, foundation officials, and education-policy experts show less interest in implementing the reforms they have enacted than in tackling the next big project.

In talking specifically about the vaunted teacher evaluation reforms, that piece observed:

After all of the effort and political capital expended to enact [Florida’s teacher evaluation] program, tens of thousands of hours spent observing and documenting teachers, and tens of millions of dollars spent developing the requisite tests . . . 97% of teachers were rated effective or better. In Tennessee, a Race to the Top grant winner and another state regarded as an exemplar of teacher-evaluation reform, 98% of teachers were rated at or above expectations. In Michigan, the figure was again 98%. Obviously, no one thinks these results reflect a true measure of teacher quality. Rather, they mean that the enormous effort and expense invested in these teacher-evaluation reforms have thus far achieved next to nothing. The reason is a straightforward failure of follow through. Legislators can change evaluation policies but cannot force principals to apply them rigorously. And it turns out that, even after policies were changed, principals still were not sure what poor teaching looked like, still did not want to upset their staffs, and still did not think giving a negative evaluation was worth the ensuing tension and hassle—especially given contractual complications and doubts that superintendents would back up personnel actions against low-rated teachers.

Emboldened by a remarkable confidence in noble intentions and technocratic expertise, advocates have tended to act as if these policies would be self-fulfilling. They can protest this characterization all they want, but one reason we’ve heard so much about pre-K in the past few years is that, as far as many reformers were concerned, the big and interesting fights on teacher evaluation had already been won. They had moved on.

There’s a telling irony here. Back in the 1990s, there was a sense that reforms failed when advocates got bogged down in efforts to change “professional practice” while ignoring the role of policy. Reformers learned the lesson, but they may have learned it too well. While past reformers tried to change educational culture without changing policy, today’s frequently seem intent on changing policy without changing culture. The resulting policies are overmatched by the incentives embedded in professional and political culture, and the fact that most school leaders and district officials are neither inclined nor equipped to translate these policy dictates into practice.

And it’s not like policymakers have helped with any of this by reducing the paper burden associated with harsh evaluations or giving principals tools for dealing with now-embittered teachers. If anything, these evaluation systems have ramped up the paperwork and procedural burdens on school leaders—ultimately encouraging them to go through the motions and undercut the whole point of these systems.

What could be done about all this? Any number of things, actually. There’s a need for training that actually helps school leaders dig into the whys and hows of contemporary personnel management—rather than fly-bys that help them complete the new forms. States and districts could work to cut the headaches, impediments, and organizational routines that discourage serious evaluation. In some locales where the preconditions are in place, teachers could be dealt into the work via peer assistance and review models. Of course, any of this requires more humility about how much and how fast all this will matter.

What’s happened isn’t surprising. It’s pretty much par for the course, where hurried efforts to adopt a new enthusiasm wind up disappointing in practice, breeding cynicism and adding fuel to the ceaseless search for the next big thing. We’ll see if this time is any different.

– Rick Hess

This first appeared on Rick Hess Straight Up.

When Fancy New Teacher-Evaluation Systems Don’t Make a Difference

Latest Issue

Spring 2025

Get a Sample copy of The Journal

NEWSLETTER

Business + Editorial Office

Discover

More Information