(This post also appears on Rick Hess Straight Up.)
Last week, Columbia University sociologist Aaron Pallas savaged the DC Public Schools IMPACT teacher evaluation system in the Washington Post‘s “The Answer Sheet” blog, attacking the teacher evaluation system as “idiotic” and based on “preposterous” assumptions. Pallas asked, “Did DCPS completely botch the calculation of value-added scores for teachers, and then use these erroneous scores to justify firing 26 teachers and lay the groundwork for firing hundreds more next year?” He asserted, “According to the only published account of how these scores were calculated, the answer, shockingly, is yes.”
At the same time, however, Pallas was forced to concede, “Value-added scores may have been misused in the termination of 26 teachers… [but] I cannot be sure that this is what happened.” Indeed, Pallas acknowledged that he never even tried to ask DCPS how the system works and based his attack on a bizarre distortion of what DCPS does. As one academic observed, “If you read Pallas’s critique, he says, ‘I don’t really know what DCPS did but, if they did this, it would be bad.’ He’d be right, it would be bad. But they did nothing like what he said.”
Pallas’s WaPo piece extended the confused analysis that he first offered in his “Sociological Eye on Education” blog for the Hechinger Institute’s Hechinger Report. (Full disclosure: I’m on the advisory board for the Hechinger Report and contribute occasionally to “Answer Sheet”–although these are rather ironic affiliations at the moment).
The background: A week ago Friday, DCPS fired 165 teachers for performance reasons. Another 700-odd teachers were deemed “minimally effective” and given probation. These teachers were evaluated using DC’s pioneering IMPACT system. IMPACT is the fruit of a longtime collaboration between DCPS and the ubergeeks at Mathematica Policy Research. Of the 165 teachers dismissed for “ineffective” performance, value-added student achievement data factored in the evaluation of 26 (the other 139 teachers taught in subjects and grades for which value-added data weren’t available). Pallas’s attack on DCPS focuses on these 26. For those 26, fifty percent of their evaluation was based on the value-added metrics, forty percent on five structured classroom observations (three by their principal and two by floating “master educators”), five percent on contribution to the school community, and five percent on school-level value-added. Folks in DCPS tell me that faring abysmally on the value-added alone wouldn’t suffice to get anyone fired; teachers also had to scrape bottom on the classroom observations.
There are three egregious problems with Pallas’s critique. Whether they are due to ineptitude or malice is tough to say.
First, Pallas never tried to find out how IMPACT worked before blasting it. According to his account, he simply found the layman’s description intended for DC teachers on the web and treated it as a technical explanation of the system that the value-added specialists at Mathematica and in DCPS’s data shop had designed. I’m puzzled why Pallas was in such a hurry to attack that he didn’t even try to be sure he understood his target. I contacted Pallas to ask whether he had ever even attempted to get in touch with DCPS or Mathematica. He responded, “I relied on materials published on the DCPS website, Mathematica’s website, and other relevant materials I found on the Internet.” Pallas had time to include eight links in his follow-up WaPo post but couldn’t be bothered to call or email DCPS.
Second, Pallas baldly mischaracterizes how the IMPACT system uses value-added data. Pallas wrote, “The procedures described in the DCPS IMPACT Guidebook for producing a value-added score are idiotic.” He asserts, “According to the DCPS IMPACT Guidebook, the actual growth is a student’s scaled score at the end of a given year minus his or her scaled score at the end of the prior year. If a fifth-grader received a scaled score of 535 in math and a score of 448 on the fourth-grade test the previous year, his actual gain would be calculated as 87 points. Subtracting one score from another only makes sense if the two scores are on the same scale. We wouldn’t, for example, subtract 448 apples from 535 oranges and expect an interpretable result. But that’s exactly what the DC value-added approach is doing: Subtracting values from scales that aren’t comparable.”
The problem: This isn’t what DCPS does. In fact, this isn’t what anybody does. Pallas is critiquing an illustration intended to help classroom teachers understand the concept of value-added. The straw man Pallas so earnestly assaults has nothing to do with the IMPACT system that DCPS uses–or with value-added models used by any credible organization. Stanford University economist Rick Hanushek, who’s been a technical advisor on IMPACT, said, “DCPS is very concerned about fairly attributing learning gains to teachers…[and] has developed a sophisticated statistical model to account for a variety of non-teacher factors that could affect gains, including student mobility, test error, family background, English language proficiency, and special education status. These models are not the sole determinant of teacher performance, but are part of an elaborate evaluation system that DCPS has pioneered.” Hanushek observed, “Some academics are so eager to step out on policy issues that they don’t bother to find out what the reality is.”
Third, Pallas points out the importance of “vertical scaling”–that test scores be comparable from one grade to the next if gains over time are to be fairly calculated. He wrote, describing the DC Comprehensive Assessment System (DC CAS), “In grade four, the minimum possible scaled score is 400, and the maximum possible scaled score is 499. In grade five, however, the minimum possible scaled score is 500, and the maximum possible scaled score is 599…A fourth-grade student who got every question on the fourth-grade math assessment correct would receive a lower scaled score than a fifth-grade student who got every question wrong…That sounds ridiculous, but it’s not problematic if the scale for fourth-grade performance is acknowledged to be different from the scale for fifth-grade performance.” But, Pallas goes on to assert, this means those results can’t be used to calculate value-added.
Pallas is seemingly unaware that every value-added effort is premised on making the requisite “apples-to-apples” adjustments in reliable and valid ways, and that there’s a growing industry that specializes in doing precisely this. For Pallas to imagine he’s flagging an overlooked challenge is bizarre, as is his assumption that no one in DCPS or Mathematica has acquainted themselves with the psychometric properties of the DC CAS. As one DCPS analyst said, “The idea that a firm like Mathematica wouldn’t do its homework on the properties of the test and wouldn’t understand the model is just crazy. All I can figure is that Pallas doesn’t understand anything about the value-added space.” (Now, if Pallas wants to argue that meaningful vertical scaling is, for all practical purposes, a pipe dream, and that attempting to calculate value-added learning is a fool’s errand, he ought to say so. Of course, that means going toe-to-toe with the experts working on value-added models. I imagine it’s more fun to just kneecap DCPS from the sidelines).
Even with a vertically scaled score, you would never try the simple subtraction exercise that Pallas imagines. Any competent analyst would seek to adjust for summer fall-off, fade-out, and a slew of other factors that could systematically bias results. For Pallas to imagine that Mathematica would not recognize or not address these challenges is hard to fathom.
Let’s be clear. Is it possible that individuals will be treated unfairly when judged by five observations by two different people over the course of a year and on the basis of the adjusted performance of their students? Of course it is. No evaluation system is ever perfect, or perfectly fair. And I’ve urged sensible caution when adopting value-added systems for just that reason. If Pallas was flagging real problems with the design or implementation of IMPACT, that would be a useful contribution. But for him to wildly mischaracterize a careful attempt to engineer a thoughtful evaluation system is nothing short of scurrilous.
As for Professor Pallas, he wants it both ways. He wants the credibility afforded an academic expert without having to uphold academic norms of research, reason, or evidence. If he had contacted DCPS and was ignored or stonewalled, that’d be one thing. But Pallas didn’t even try. Make a choice, pal. If you want the perks and the podium that come with being an academic, then act the part.