What Have We Learned from the Gates-Funded Teacher Evaluation Reforms?

Nine years, $575 million dollars, and 500-plus pages later, what have we learned about the Gates Foundation’s ambitious efforts to improve teacher effectiveness through evaluation and human capital reforms? The headlines about the RAND Corporation’s recently released final report have focused on the lack of any consistent effects on student outcomes, but the real story here is the many insights about implementation—what actually happened on the ground—based on rich qualitative and survey data. Here are some of my key takeaways from the report.

The study evaluated the Gates’ Intensive Partnership initiative with three school districts and four charter management organizations, which lasted from 2009 to 2015 and provided $575 million in total funding ($800 – $3,500 per pupil). In exchange, participating districts/CMOs committed to implementing major reforms to their teacher recruitment, screening, evaluation, and compensation systems.

In many ways this initiative should be viewed as a proof of concept. The participating districts/CMOs were specifically selected because of their strong commitment to the reforms, and they had unprecedented financial support. Their efforts provide a rare window into whether evaluation and human capital reforms work under very favorable circumstances—a truer test of the reforms themselves.

Despite the strong initial buy-in and generous funding, capacity constraints and the tension between using evaluation for both formative and summative purposes proved to be major implementation challenges. The evaluations were a huge burden on principals and the new system resulted in few teachers rated below Proficient. The consistent upward shift in ratings over time was surprising to me. I originally thought the opposite would happen when the first new evaluation ratings came out a few years ago. I assumed there would be an implementation learning curve, but I underappreciated the interpersonal elements of the evaluation process. Principals consistently reported their preferences for helping teachers improve as opposed to dismissing them. This included recognizing and rewarding effort/growth in evaluations. However, rewarding marginal growth with an additional point on a four- or five-category scale quickly led to inflated ratings. It’s unsurprising then, that teachers were rarely rated below Proficient in the new system.

The report also found that districts/CMOs struggled to use the evaluations as an engine for professional development. Attempts to differentiate professional development based on individual performance by offering teachers access to on-line clearinghouses of PD resources were seen as ineffective by teachers. The observation and feedback process was limited due to capacity and funding constraints. Frequent individualized feedback is time intensive, and time is a major driver of personnel costs, which constrained the use of feedback as a development tool. Yes, believe it or not, it appears that system-wide improvement via observation and feedback requires even more investment (or at least better use of existing resources.)

Twenty years from now, one positive legacy of these reform initiatives will likely be the widespread use of rigorous classroom observation rubrics. This was one of the few elements that was viewed favorably by both teachers and school leaders well into the reform. These next-generation rubrics provide a clear benchmark and common language about what high-quality instruction looks like.

Another key goal was to attract quality teachers and bolster retention through bonuses and career ladder opportunities connected to evaluations. While the logic is clear, the limited implementation of this lever was a real missed opportunity. Teachers saw the career ladder opportunities more akin to “add-ons” instead of distinguished job promotions. They also repeatedly expressed that the modest bonuses—on the order of a few thousand dollars—weren’t enough to motivate a change in practice.

Perhaps the greatest lost opportunity of the initiative was failing to provide sufficient incentives to teachers given the available funding. The underpowered efforts failed to change teacher practices or to attract higher quality novice teachers. One school leader is quoted as writing “Our teaching staff will only be as good as what we are allowed to select from. We have to strengthen our pool of eligible candidates . . . “ Supplemental bonuses and secondary add-on roles are unlikely to be a very visible or effective way to recruit new talent into teaching. A complete redesign of teacher compensation with higher top-end base salaries would have provided much greater insights about the potential of these reforms for attracting and retaining quality talent. Concerns about the sustainability of such a reform likely played a role here, but there are still many ways to envision compensation reforms that are cost-neutral.

It is no wonder, then, that RAND found no evidence that the districts/CMOs were able to attract higher quality novice teachers. Evaluation reforms introduce new risks into the teaching profession and, in some cases, can make teachers feels as though they have more limited professional autonomy. Meaningful career growth opportunities and higher potential earnings might have offset these potential negative consequences.

The report also highlighted a rarely discussed headwind in the evaluation reform era: the challenge posed by the substantial turnover in district/CMO leadership. The success of evaluation and human capital reforms in D.C. cannot be understood outside the context of the consistent leadership they have benefitted from over the last decade.

On a more positive note, two key takeaways regarding placement and school turnover came out of the report. The initiative resulted in a meaningful reduction of both forced teacher placements and seniority transfers. This means that tenured teachers unable to secure a position were less likely to be foisted onto schools and that less-experienced teachers were “bumped” from their positions by more senior teachers less often.

In terms of the impact analyses, the moderate-to-large positive effects reform advocates had envisioned can be ruled out. It is difficult to say much more with any precision given several important limitations which the authors recognize. Three issues stand-out as particularly concerning:

• An effective sample size of only seven districts/CMOs
• A short pre-treatment period which limits the ability to project counterfactual trends in the participating districts/CMOs
• The confounding effect of high-stakes teacher evaluation reforms occurring across the nation concurrently with the initiative

It is likely that the results of the Gates’ Intensive Partnership initiative will be seen simply as the final nail in the coffin of the teacher evaluation reform era, even though they offer far more nuanced insights. Despite underwhelming impact findings and real implementation concerns, the initiative resulted in an incredibly rich set of information about the organizational and cultural barriers to enacting major human capital reforms in public schools. Instead, I hope this study informs how districts and CMOs invest their future resources and build new systems to attract, develop, and retain effective teachers in the modern ESSA era.

— Matthew Kraft

Alex Bolves and Alvin Christian provided excellent assistance in writing this post. Matthew Kraft is an assistant professor of education and economics at Brown University. You can read about his research at www.matthewakraft.com and follow him on Twitter @MatthewAKraft.

What Have We Learned from the Gates-Funded Teacher Evaluation Reforms?

Latest Issue

Spring 2025

Get a Sample copy of The Journal

NEWSLETTER

Business + Editorial Office

Discover

More Information