(This post also appears on Rick Hess Straight Up.)
On Sunday, the L.A. Times ran its controversial analysis of teacher value-added scores in L.A. Unified School District (LAUSD). The paper used seven years of reading and math scores to calculate performance for individual teachers who’ve taught grades three through five, and plans to publish the effectiveness ratings with the teacher’s names. The actual analysis was handled for the paper by RAND analyst Richard Buddin. If you want to get quickly up to speed on this, check out Joanne Jacobs’ stellar summary here and Stephen Sawchuck’s take here. The story has triggered an avalanche of comment, including cheers from our earnest Secretary of Education and scathing responses from the likes of Diane Ravitch and Alexander Russo.
Given my taste for mean-spirited measures, and the impressive journalistic moxie it showed, I really wanted to endorse the LAT‘s effort. But I can’t. Now, don’t get me wrong. I’m all for using student achievement to evaluate and reward teachers and for using transparency to recognize excellence and shame mediocrity. But I have three serious problems with what the LAT did.
First, as I’ve noted here before, I’m increasingly nervous at how casually reading and math value-added calculations are being treated as de facto determinants of “good” teaching. As I wrote back in April, “There are all kinds of problems with this unqualified presumption. At the most technical level, there are a dozen or more recognized ways to specify value-added calculations. These various models can generate substantially different results, with a third of each result varying with the specifications used. When used for a teacher in a single classroom, we frequently only have 20 or 25 observations (if that). The problem is that the correlation of such results year after year is somewhere in the .25 to .35 range.”
Second, beyond these kinds of technical considerations, there are structural problems. For instance, in those cases where students receive substantial pull-out instruction or work with a designated reading instructor, LAT-style value-added calculations are going to conflate the impact of the teacher and this other instruction. How much of this takes place varies by school and district, but I’m certainly familiar with locales where these kinds of “nontraditional” (something other than one teacher instructing 20-odd students) arrangements accounts for a hefty share of daily instruction. This means that teachers who are producing substantial gains might be pulled down by inept colleagues, or that teachers who are not producing gains might look better than they should. Currently, there is nothing in the design of data systems that can correct for these kinds of common challenges. At a minimum, in the case of LAUSD, I would like to see data on how much of the relevant instruction is provided by the teachers in question–rather than by colleagues.
Third, there’s a profound failure to recognize the difference between responsible management and public transparency. Transparency for public agencies entails knowing how their money is spent, how they’re faring, and expecting organizational leaders to report on organizational performance. It typically doesn’t entail reporting on how many traffic citations individual LAPD officers issued or what kind of performance review a National Guardsman was given by his commanding officer. Why? Because we recognize that these data are inevitably imperfect, limited measures and that using them sensibly requires judgment. Sensible judgment becomes much more difficult when decisions are made in the glare of the public eye.
So, where do I come out? I’m for the smart use of value-added by districts or schools. I’m all for building and refining these systems and using them to evaluate, reward, and remove teachers. But I think it’s a mistake to get in the business of publicly identifying individual teachers in this fashion. I think it confuses as much as it clarifies, puts more stress on primitive systems than they can bear, and promises to unnecessarily entangle a useful management tool in personalities and public reputations.
Sadly, this little drama is par for the course in K-12. In other sectors, folks develop useful tools to handle money, data, or personnel, and then they just use them. In education, reformers taken with their own virtue aren’t satisfied by such mundane steps. So, we get the kind of overcaffeinated enthusiasm that turns value-added from a smart tool into a public crusade. (Just as we got NCLB’s ludicrously bloated accountability apparatus rather than something smart, lean, and a bit more humble.) When the shortcomings become clear, when reanalysis shows that some teachers were unfairly dinged, or when it becomes apparent that some teachers were scored using sample sizes too small to generate robust estimates, value-added will suffer a heated backlash. And, if any states get into this public I.D. game (as some are contemplating), we’ll be able to add litigation to the list. This will be unfortunate, but not an unreasonable response–and not surprising. After all, this is a movie we’ve seen too many times.
Would it have really been such a compromise to have kept teacher names anonymous and to have reported scores by school, or community, or in terms of citywide distribution?