Are Test Scores Good Proxies for School Quality?

In a series of blog posts (IIIIIIIV), Jay Greene argues against the “high-regulation approach” to school choice. I’m going to focus on the final two posts, in which Greene argues that student achievement tests are poor proxies for school quality and that they’re not correlated with other measures of quality.

I think Greene is right to a large extent. But he undersells the value of tests.

It’s pretty clear that the ability of a school or teacher to increase students’ standardized test scores is associated with long-run outcomes. Let’s dig in to some evidence:

• The well-known Chetty study used a rigorous quasi-experiment to show that teachers with high value-added scores (which are based on standardized tests) produced higher income, greater college attendance, and lower teen pregnancy among students. (In the comments of his post, Greene acknowledges this study but describes the effects as small. I disagree, considering we are describing the effects of a single teacher at a single grade level.)

• A different Chetty study reports that “students who were randomly assigned to higher-quality classrooms in grades K–3—as measured by classmates’ end-of-class test scores—have higher earnings, college attendance rates, and other outcomes.”

• Hanushek finds that international academic achievement (as measured by PISA scores) was predictive of economic growth, while academic attainment was not. Specifically: “The level of cognitive skills of a nation’s students has a large effect on its subsequent economic growth rate. Increasing the average number of years of schooling attained by the labor force boosts the economy only when increased levels of school attainment also boost cognitive skills.” This appears to contradict Greene’s claim that “attainment is a more meaningful indicator of long-term benefits than achievement test results.”

• Dynarski shows that class size reduction in the Tennesee Project STAR experiment led to short-term achievement gains and long-run attainment gains. Notably, they find that “the short-term effect of a small class on test scores is an excellent predictor of its effect on adult educational attainment. In fact, the effect of small classes on college attendance is completely captured by their positive effect on contemporaneous test scores.”

• Dobbie and Fryer find that attending the Harlem Children’s Zone leads to increased achievement on both high-stakes and low-stakes exams, as well as lower (self-reported) rates of teen pregnancy among girls and incarceration among boys. Greene mentions this study, pointing to the non-significant attainment findings, but does not mention the pregnancy and incarceration effects.

• Deming examines Texas’s test-based accountability system and show that for students at low-performing schools, it led to increased achievement, college attendance, degree attainment, and income earning. Another interesting finding from Deming: “Low-scoring students in schools that were close to achieving a [higher school accountability] rating were actually more likely to graduate from high school despite experiencing significant declines in earnings.” This suggests that, in some cases, attainment may not be a good proxy for school performance.

The Deming finding is a particularly important result because, in contrast to most previous studies, the achievement was measured on a high-stakes exam. It’s clear that we know a lot more about low-stakes assessments than high-stakes ones, though. The evidence suggests that attaching stakes to tests can produce some unintended consequences, but there is reason to believe that it can also lead to gains on low-stakes exams. We could do with a lot more evidence on the relationship between high-stakes tests and long-run outcomes.

Let’s now turn to a second point that Greene makes. “The evidence is increasingly clear,” he writes, “that test scores are only weakly correlated with all of these other desirable outcomes from schools.”

I agree. He points to some examples in school choice studies. This also appears in the teacher quality literature. Jackson shows that teachers’ value-added scores are only weakly correlated with their impacts on non-cognitive outcomes (absences, suspensions, and grades). A study from Jennings and DiPrete produces similar results. Gershenson finds no correlation between teachers’ value-added scores and their impact on student attendance.  And to turn back to school choice for a moment, Imberman finds that charters in an unnamed urban district had no effect on student tests scores—but had large positive effects on discipline and attendance.

I would finally note that achievement tests are not uniform in quality, which may help explain some of the disparate results. As the new generation of supposedly more rigorous Common Core–aligned tests gain hold, it will be important to do additional research on these questions.

To sum up: 1) low-stakes tests appear to measure something meaningful that shows up in long-run outcomes; 2) we don’t know nearly as much about high-stakes exams and long-run outcomes; and 3) there doesn’t seem to be a strong correlation between test-score gain and other measures of quality at either the teacher or school level.

The implications of all this for school choice regulation, however, is a question for another day.

– Matt Barnum

Matt Barnum is the policy and research editor at the Seventy Four. This blog entry first appeared on Flypaper.

Last Updated


Notify Me When Education Next

Posts a Big Story

Business + Editorial Office

Program on Education Policy and Governance
Harvard Kennedy School
79 JFK Street, Cambridge, MA 02138
Phone (617) 496-5488
Fax (617) 496-4428

For subscription service to the printed journal
Phone (617) 496-5488

Copyright © 2024 President & Fellows of Harvard College