Evidence-based Debates on Teacher Quality
The following is an excerpt from What Lies Ahead for America’s Children and Their Schools, a new book edited by Chester E. Finn, Jr., and Richard Sousa for Hoover Institution Press. This excerpt comes from a chapter called “Boosting Teacher Effectiveness” by Eric A. Hanushek.
Policy debates have changed swiftly to incorporate the research evidence on teachers.  It is difficult to enter into any school policy discussion that does not touch on the issue of teacher quality.
Moreover, the character of the discussions has become much more sophisticated and knowledgeable. The naïve calls for “highly qualified teachers” in the No Child Left Behind act have been replaced by recognition that credentials and qualifications—the objects of past policies—are not closely related to teacher effectiveness in the classroom. While there has been no rush to eliminate salary differentials based on advanced degrees (about 10 percent of all teacher salary payments), they have become a greater part of the discussion.
Similarly, a teacher’s classroom experience after the first few years has been shown to have no effect on teacher performance. There has been little discussion of eliminating the longevity portion of teachers’ salaries, even though over one-quarter of the total wage bill goes to bonuses for teachers with greater than two years of experience (around the cutoff in the evidence about the returns to experience). But there has been intense discussion of LIFO -provisions—last in, first out—in laws and contracts that govern separations during force reductions. These policies are closely related to the evidence on effectiveness and experience. The use of LIFO rules instead of ones based on teacher effectiveness have been shown to increase the number of teachers who must be dismissed and to dramatically alter the quality of dismissals when compared to policies based on effectiveness. 
While considerable discussion exists on how we might want to change schools of education, little of this is directly related to the performance of students. Indeed, we have just rudimentary evidence on whether some schools of education do a better job than others. There is suggestive information in the fact that there is not very much difference in average effectiveness by teachers’ routes into their careers (certified vs. non-certified). 
Similarly, many would like to use improved professional development to upgrade the teaching force, but many questions about the efficacy of this remain.  Further (scientific) research on the issues surrounding professional development could prove helpful in deciding the overall thrust of teacher improvement policies.
Importantly, with the recognition of the importance of teacher quality has come a new interest in how labor laws and teacher contracts affect student outcomes.  The turmoil in Wisconsin got the most attention as the state limited bargaining to just wages and benefits and removed larger issues such as class sizes and teacher assignment policies. Partly because of Wisconsin and partly on their own, a number of other states entered into active discussions of state restrictions on teaching.
A central part of much of the teacher quality discussion has been the use of value-added measures of quality. The value-added measures are designed to provide estimates of the independent effect of the teacher on the growth in a student’s learning and to separate this from other influences on achievement such as families, peers, and neighborhoods. The validity and reliability of these measures have been widely debated and are the subject of considerable current research. 
The discussions range across a number of statistical and policy issues. But the discussion was accelerated when the Los Angeles Times and the New York Times (among others) published the names and value-added rankings of thousands of local teachers. The public attention to variations in teacher effectiveness led to an uproar—an uproar that helped focus the policy discussion and local bargaining.
Attention to test scores in the value-added estimation raises issues of the narrowness of the tests, of the limited numbers of teachers in tested subjects and grades, of the accuracy of linking teachers and students, and of the measurement errors in the achievement tests. Each is an important issue that has fueled continuing research efforts. This subsequent research is helping to define how best to use the statistical evidence on teacher quality.
The value-added discussions have also opened new consideration of alternative ways of valuing teachers. While teachers have always been evaluated in some manner, it is clear that until recently the evaluations provided little information, particularly for making any personnel decisions.  Thus, efforts have been made to develop and use observational protocols that more accurately indicate classroom effectiveness. 
A closely related discussion has revolved around the use of performance pay. Teachers are currently paid according to experience and to possession of an advanced degree, neither of which is closely related to classroom effectiveness. The argument has long been made that at least a portion of pay should reflect merit in order to provide incentives for teachers to do better. This idea led to a somewhat ill-conceived experiment by Vanderbilt researchers in which a randomly selected group of teachers received bonuses based on the performance of their students.  When compared to students of teachers not receiving any bonuses, students of those with the possibility of performance pay did no better. This study demonstrated that offering a bonus for better performance to existing teachers has very little influence on what they do. This is exactly what has been shown by the multiple studies of merit pay that focus on the impact of relatively small bonuses for current teachers on their performance in the classroom. The simplest interpretation is that almost all current teachers are indeed working to do the best that they can.
At the same time, this is not a demonstration that salaries have no effect. Both the level of salaries and the pattern of salaries across teachers affect who enters and who stays in teaching. Higher salaries and a greater relationship to performance would attract a different group of people into teaching. Indeed, the impact of salaries on selection into teaching is the key issue for those who think that performance pay is important.
Nonetheless, the availability of this “gold standard” study has allowed the unions and the schools to argue that performance pay has been tried and simply has not worked. This situation actually demonstrates a further issue in making evidenced-based policy. It is often possible to find or to interpret evidence in order to support very different positions. This in fact has made moving to rational policy positions more difficult, particularly in areas of personnel policy where vested interests are especially important.
The possibility of evidence being hijacked for the use of special interest groups serves to reinforce the need for continued research and evaluation. Only superior and more reliable evidence can top the biased use of evidence.
The Prospects for Further Improvement
The world of education is moving steadily toward reliance on evidence, even with the possibility for misinterpretation. Moreover, the evidence on teacher quality issues is beginning to win the day.
The movement toward better overall policy is seen directly in state actions. For example, all states except California had unique student identifiers in 2011; thirty-five had unique teacher identifiers that allowed linking teachers to students.  Between 2009 and 2011, twenty-six states moved to include evidence of student learning in teacher evaluations, and ten states mandated that student learning would be the preponderant criterion in local evaluations.
In teacher tenure decisions, there has been considerable recent progress. More and more states are moving to require evidence of teacher effectiveness and to extend the minimum number of years for tenure. About a third of states also support differential pay in shortage subject areas and do not have regulatory language blocking differential pay. Similarly, about a third of states support differentially rewarding effective teachers.
While there is a considerable way to go in expanding and refining these changes, the pattern of state policies toward effective teachers has changed dramatically in recent years.
And there is a new sense of forward movement at the local level. Perhaps the best story comes from Washington, DC. This district, by far the worst in the nation, went through agonizing battles between Michelle Rhee (the former chancellor of Washington public schools) and the unions. But four years ago the unions agreed to a new contract that introduced both value-added and observational evaluations and that used them in personnel decisions. At this time about one thousand teachers have received substantial increases in their base salaries because of continued top performance. But close to 500 teachers have been dismissed because of continued poor performance. The whole evaluation system is continually being developed and improved, but it has reached a level of acceptance that bodes well for the future.
Importantly, there is now direct evidence that the Washington, DC, personnel policies are paying off. Thomas Dee and James Wyckoff found that dismissal threats increased the voluntary attrition of low-performing teachers by more than 50 percent.  Additionally, low-performing teachers who stayed improved their performance significantly, as did high-performing teachers who were in the range to get bonuses.
Similarly, the Los Angeles Unified School District has moved to remove around one hundred poorly performing teachers. While this remains small compared to the total number of teachers in Los Angeles, it is orders of magnitude larger than what was seen just a couple of years ago.
Many states and localities are developing what must be thought of as experimental programs for ensuring teacher quality. The key to the future is validating and replicating the ones that prove successful and eliminating the ones that do not. Doing this requires a strong research and evaluation activity to match the policy experimentation.
Eric Hanushek is the Paul and Jean Hanna Senior Fellow at the Hoover Institution of Stanford University.
Notes:1. For example, John E. Chubb, The Best Teachers in the World: Why We Don’t Have Them and How We Could (Stanford, CA: Hoover Institution Press, 2012). 2. For example, Donald Boyd, Hamilton Lankford, Susanna Loeb, and James Wyckoff, “Teacher Layoffs: An Empirical Illustration of Seniority versus Measures of Effectiveness,” Education Finance and Policy 6, no. 3 (Summer 2011): 439–454. 3. For example, Thomas J. Kane, Jonah E. Rockoff, and Douglas O. Staiger, “What Does Certification Tell Us About Teacher Effectiveness? Evidence from New York City,” Economics of Education Review 27, no. 6 (December 2008): 615–631; and Donald J. Boyd, Pamela L. Grossman, Hamilton Lankford, Susanna Loeb, and James Wyckoff, “Teacher Preparation and Student Achievement,” Educational Evaluation and Policy Analysis 31, no. 4 (December 2009): 416–440. 4. For example, Michael S. Garet, Stephanie Cronen, Marian Eaton, Anja Kurki, Meredith Ludwig, Wehmah Jones, Kazuaki Uekawa, Audrey Falk, Howard S. Bloom, Fred Doolittle, Pei Zhu, and Laura Sztejnberg, “The Impact of Two Professional Development Interventions on Early Reading Instruction and Achievement,” National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences (Washington, DC: US Department of Education, September 2009); and Michael S. Garet, Andrew J. Wayne, Fran Stancavage, James Taylor, Marian Eaton, Kirk Walters, Mengli Song, Seth Brown, Steven Hurlburt, Pei Zhu, Susan Sepanik, and Fred Doolittle, “Middle School Mathematics Professional Development Impact Study: Findings After the Second Year of Implementation,” NCEE 2011-4024 (Washington, DC: Institute of Education Sciences, April 2011). 5. Terry M. Moe, Special Interest: Teachers Unions and America’s Public Schools (Washington, DC: Brookings Institution Press, 2011). 6. Steven Glazerman, Susanna Loeb, Dan Goldhaber, Douglas Staiger, Stephen Raudenbush, and Grover Whitehurst, “Evaluating Teachers: The Important Role of Value-Added,” The Brookings Brown Center Task Group on Teacher Quality (Washington, DC: Brookings Institution Press, November 17, 2010). 7. For example, Daniel Weisberg, Susan Sexton, Jennifer Mulhern, and David Keeling, “The Widget Effect: Our National Failure to Acknowledge and Act on Differences in Teacher Effectiveness,” 2nd ed. (New York: The New Teacher Project, 2009). 8. Thomas J. Kane, Daniel F. McCaffrey, Trey Miller, and Douglas O. Staiger, “Have We Identified Effective Teachers? Validating Measures of Effective Teaching Using Random Assignment,” Measures of Effective Teaching project, Bill and Melinda Gates Foundation (January 2013). 9. Matthew G. Springer, Dale Ballou, Laura Hamilton, Vi-Nhuan Le, J.R. Lockwood, Daniel F. McCaffrey, Matthew Pepper, and Brian M. Stecher, “Teacher Pay for Performance: Experimental Evidence from the Project on Incentives in Teaching” (Nashville, TN: National Center on Performance Incentives, Vanderbilt University, 2010). 10. National Council on Teacher Quality, “State Teacher Policy Yearbook, 2011” (Washington, DC: National Council on Teacher Quality, 2012). 11. Thomas Dee and James Wyckoff, “Incentives, Selection, and Teacher Performance: Evidence from IMPACT” (NBER Working paper WP19529, Cambridge, MA: National Bureau of Economic Research, October 2013).
Reprinted from What Lies Ahead for America’s Children and Their Schools, edited by Chester E. Finn Jr. and Richard Sousa, with the permission of the publisher, Hoover Institution Press. Copyright © 2014 by the Board of Trustees of the Leland Stanford Junior University.