Ranking teacher-prep programs on value-added is usually futile
New analysis finds program rankings based on graduates’ value-added scores are largely random
May 8, 2018—Last year Congress repealed a federal rule that would have required states to rank teacher-preparation programs according to their graduates’ impact on student test scores. Yet twenty-one states and D.C. still choose to rank programs in this way. Can student test performance reliably identify more and less effective teacher-preparation programs? In a new article for Education Next, Paul T. von Hippel of the Lyndon B. Johnson School of Public Affairs at the University of Texas at Austin and Laura Bellows of Duke University find that the answer is usually no.
• Differences between programs too small to matter. Von Hippel and Bellows find that the differences between teachers from different preparation programs are typically too small to matter. Having a teacher from a good program rather than an average program will, on average, raise a student’s test scores by 1 percentile point or less.
• Program rankings largely random. The errors that states make in estimating differences between programs are often larger than the differences states are trying to estimate. Program rankings are so noisy and error prone that in many cases states might as well rank programs at random.
• High chance of false positives. Even when a program appears to stand out from the pack, in most cases it will be a “false positive”—an ordinary program whose ranking is much higher (or lower) than it deserves. Some states do have one or two programs that are truly extraordinary, but published rankings do a poor job of distinguishing these “true positives” from the false ones.
• Consistent results across six states. Using statistical best practices, von Hippel and Bellows found consistent results across six different locations—Texas, Florida, Louisiana, Missouri, Washington State, and New York City. In every location the true differences between most programs were miniscule, and program rankings consisted mostly of noise. This was true even in states where previous evaluations had suggested larger differences.
When measured in terms of teacher value-added, “the differences between [teacher-preparation] programs are typically too small to matter. And they’re practically impossible to estimate with any reliability,” say von Hippel and Bellows. They consider other ways to monitor program quality and conclude that most are not ready for prime time. But they do endorse reporting the share of a program’s graduates who become teachers and persist in the profession—especially in high-need subjects and high-need schools.
To receive an embargoed copy of “Rating Teacher-Preparation Programs: Can value-added make useful distinctions?” or to speak with the authors, please contact Jackie Kerstetter at firstname.lastname@example.org. The article will be available Tuesday, May 8 on www.educationnext.org and will appear in the Summer 2018 issue of Education Next, available in print on May 24, 2018.
About the Authors: Paul T. von Hippel is an associate professor at the University of Texas at Austin and Laura Bellows is a doctoral student in public policy at Duke University.
About Education Next: Education Next is a scholarly journal committed to careful examination of evidence relating to school reform, published by the Education Next Institute and the Harvard Program on Education Policy and Governance at the Harvard Kennedy School. For more information, please visit www.educationnext.org.