Test Score Gains Predict Long-Term Outcomes, So We Shouldn’t Be Too Shy About Using Them
Editor’s note: This post is the sixth and final entry in an ongoing discussion between Fordham’s Michael Petrilli and the University of Arkansas’s Jay Greene that seeks to answer this question: Are math and reading test results strong enough indicators of school quality that regulators can rely on them to determine which schools should be closed and which should be expanded—even if parental demand is inconsistent with test results? Prior entries can be found here, here, here, here and here.
Shoot, Jay, maybe I should have quit while we were ahead—or at least while we were closer to rapprochement.
Let me admit to being perplexed by your latest post, which has an Alice in Wonderland aspect to it—a suggestion that down is up and up is down. “Short-term changes in test scores are not very good predictors of success,” you write. But that’s not at all what the research I’ve pointed to shows.
Start with the David Deming study of Texas’s 1990s-era accountability system. Low-performing Lone Star State schools faced low ratings and responded by doing something to boost the achievement of their low-performing students. That yielded short-term test-score gains, which were related to positive long-term outcomes. This is the sort of thing we’d like to see much more of in education, wouldn’t we?
Yet you focus on the negative finding: that higher-performing Texas schools were more likely to keep their low-performing kids from taking the test, and those kids did worse over the long term. Supposing that’s so, it merely indicates a flaw in Texas’s school accountability system, which should have required schools to test virtually all of their students (as No Child Left Behind later did). The reason that this group of low-performers did worse is most likely because their schools failed even to try to raise achievement. If they had, those kids’ long-term outcomes would have likely been better too.
As for your points on test score “fade-out,” you are right in that we see this phenomenon in both the pre-K studies you mentioned and in Project Star. Why it happens is an interesting question for which nobody has a great answer, as far as I know, other than the obvious point that the schools and classrooms those kids enter into don’t know how (or don’t try very hard) to sustain earlier gains. But it doesn’t really matter. For our purposes, what it shows is that short-term test score gains don’t lead to long-term test score gains, but they do lead to long-term success. Which is the Holy Grail!
Let’s take it out of the abstract. Let’s say we want to evaluate preschools on whether their students make progress on cognitive assessments, or judge elementary schools based on student-level gains during grades K–3. The evidence indicates that preschools or elementary schools that knock it out of the park in terms of test score gains will see those impacts fade over time, as gauged by test scores. But the kids enrolled in those preschools and elementary schools will benefit in the long term. Whatever the schools are doing to raise short-term test scores is also helping lead to later success; we can measure the scores, but we can’t measure the other stuff. But remind me again: Why we wouldn’t want to use short-term test scores as one gauge of school or program quality?
You end much as you begin:
Rather than relying on test results anyway and making potentially disastrous decisions to close schools or shutter programs on bad information, we should recognize that local actors—including parents—are in a better position to judge school quality. Their preferences deserve strong deference from more distant authorities.
And as I’ve written previously, we’re of one mind in being “anti-bad-information.” We should absolutely stop using performance levels alone (i.e., proficiency rates) to judge school quality. We should be concerned about accountability systems or authorizing practices that might encourage counterproductive practices—like excluding kids from testing or focusing narrowly on reading and math skills instead of a broad curriculum. And we also agree that parents deserve much deference.
But I don’t agree that short-term achievement gains should be put in the “bad information” bucket. And I think you’re being a tad naïve about the quality of “information” that parents themselves have about their schools, which is often extremely limited or hard to interpret. Most parents (myself included) have only a hazy picture of “school quality” and how to know whether it’s present at our own kids’ schools. You know if your child is happy, if the teacher is welcoming, and if the place is safe. It’s a lot harder to know how much learning is taking place, especially in an age when grade inflation is rampant. (Why else would 90 percent of the nation’s parents think that their own children were on grade level?) The government has a role to play in making sure that all school choices meet a basic threshold for quality, just as it has a role in making sure that all of our choices at the grocery store are safe.
So I return to my proposition: Let’s not make high-stakes decisions about schools or programs based on test scores alone. But let’s not ignore those scores, either, or trivialize their influence to such an extent that we allow persistently low-performing schools to persist, zombie-like, in perpetuity.
Charter school authorizers and other quality monitors should react swiftly when schools post mediocre or worse value-added scores. They should give those schools—and their parents—the chance to demonstrate their quality through other means. They should do what they can to turn failure into success, hard though that is. But for the good of the kids, the public, and the sector, they shouldn’t hesitate to shutter schools that aren’t helping children progress.
And with that, let me thank Jay for a great debate. We may not agree on what test scores can tell us, but I’m heartened that we concur that there are times when officials must act to address low performance. Parental choice is necessary, but not sufficient. Q.E.D.
– Mike Petrilli
This first appeared on Flypaper.