Great Ideas For Designing Accountability Systems for Schools

On Tuesday afternoon, we at the Fordham Institute will host a competition to present compelling designs for state accountability systems under the Every Student Succeeds Act. (Event details here.) The process has already achieved its objective, with more than two dozen teams submitting proposals that are chock-full of suggestions for states and commonsense recommendations for the U.S. Department of Education. They came from all quarters, including academics (such as Ron Ferguson, Morgan Polikoff, and Sherman Dorn); educators (including the Teach Plus Teaching Policy Fellows); policy wonks from D.C. think tanks (including the Center for American Progress, American Enterprise Institute, and Bellwether Education Partners); and even a group of Kentucky high school students. Selecting just ten to spotlight in Tuesday’s live event was incredibly difficult.

I’ve pulled out some of the best nuggets from across the twenty-six submissions.

Indicators of Academic Achievement

ESSA requires state accountability systems to include an indicator of academic achievement “as measured by proficiency on the annual assessments.”

Yet not a single one of our proposals suggests using simple proficiency rates as an indicator here. That’s because everyone is aware of NCLB’s unintended consequence: encouraging schools to pay attention only to the “bubble kids” whose performance is close to the proficiency line. So what to use instead? Ideas include:

An achievement index. Under these plans, schools would receive partial credit for getting students to basic levels of achievement, full credit for getting them to proficiency, and extra credit for getting them to advanced. (That’s how Ohio’s “performance index” works today.) Here’s how Kristin Blagg of the Urban Institute puts it: “There is evidence that accountability pressure, particularly when schools or districts are on the margin of adequate yearly progress (AYP), is associated with neutral-to-positive achievement gains….I would use a five-tier system to assess the levels of student achievement, banking on the idea that four rather than three cut-offs would spur more schools to coach ‘bubble’ students into the next achievement level.”

Scale scores. Other plans skip the use of performance levels entirely, at least for the purpose of calculating achievement scores. Morgan Polikoff and colleagues write about their design, “Performance is measured by conversion of students’ raw scale scores at each grade level to a 0–100 scale. This is superior to an approach based on performance levels or proficiency rates in that it rewards increases in performance all along the distribution (rather than just around the cut points).”

Cross-sectional achievement. Sherman Dorn proposes a mix of many measures, all derived (and transformed) from proficiency rates and scale scores. As he puts it in a blog entry, it is “a deliberate jury-rigged construction; that is a feature, not a bug.”

Here we pose a question for the U.S. Department of Education: Will your regulations allow these alternatives to straight-up proficiency rates?

Indicators of Student Growth or an Alternative

ESSA also requires state accountability systems to include “a measure of student growth, if determined appropriate by the State; or another valid and reliable statewide academic indicator that allows for meaningful differentiation in school performance.” Yet nobody in our competition went for an alternative: everyone went for growth, and usually in a big way.

Richard Wenning, the designer of the Colorado Growth Model, explains why: “Disaggregation and high weighting of growth and its gaps are essential because too often, poverty and growth are negatively correlated….If an accountability system places greatest weight on growth, it creates an incentive to maximize the rate and amount of learning for all students and supports an ethos of effort and improvement.” I would add that true growth models—rather than “growth to proficiency” ones—encourage schools to focus on all students instead of just their low-performers.

So which growth models were proposed?

Student growth percentile (a.k.a. Colorado Growth Model). Back to Wenning: “Student growth percentiles based on annual statewide assessments of reading, mathematics, other core subjects…comprise the first layer of evidence reported and employed for school ratings. The corresponding metrics are median growth percentiles, with fiftieth-percentile growth reflecting the normative concept of a year’s growth in a year’s time; and adequate growth percentiles, which provide a student-level growth target constituting ‘good enough’ growth, and which yield the percentage of students on track to proficiency or on track to college and career readiness.”

Two-step value-added model. Polikoff et al. write, “This model is designed to eliminate any relationship between student characteristics and schools’ growth scores. In that sense, it is maximally fair to schools.” More information on the two-step model is available in this Education Next article by Mark Ehlert, Cory Koedel, Eric Parsons, and Michael Podgursky.

Transition matrix. Bellwether Education Partners’ Chad Aldeman suggests this approach for its simplicity and cost effectiveness. “It gives students points based on whether they advance through various performance thresholds. Unlike under NCLB, where districts focused on students right at the cusp of proficiency—the ‘bubble’ kids—this sort of method creates several, more frequent cutpoints that make it harder to focus on just a small subset of students. This approach offers several advantages over more complex growth models. Any state could implement a transition matrix without external support, and the calculations could be implemented on any state test. Most importantly, and in contrast to more complex models, the transition matrix provides a clear, predetermined goal for all students. School leaders and teachers would know exactly where students are and where they need to be to receive growth points.”

Indicator of Progress toward English Language Proficiency

ESSA also requires state accountability systems to measure “progress in achieving English language proficiency, as defined by the State.” This one’s way outside my area of expertise, but my sense is that the feasibility of implementing this part of the law rests on the quality and nature of English language proficiency assessments. Can they accurately produce “growth measures” over time? Or are states better off just using their regular English language arts assessments here, as some proposals suggest? Several proposals also suggest weighting the ELL indicator in proportion to the concentration of ELLs at a given school.

Another question for the Department of Education: Will the department allow that?

Indicator(s) of Student Success or School Quality

Finally, ESSA requires state accountability systems to include “not less than one indicator of school quality or student success that allows for meaningful differentiation in school performance” and is “valid, reliable, comparable, and statewide.” With this final set of indicators, Congress opened the door to more holistic and creative approaches to measuring effectiveness—and our contenders did not disappoint. Here are some of my favorite ideas:

School inspections. Chad Aldeman writes, “Under this system, no school’s rating is final until they complete a formal inspection. The inspections would be based off the school inspectorate model used as part of the accountability and school improvement process in England. As Craig Jerald described in a 2012 report, “inspectors observe classroom lessons, analyze student work, speak with students and staff members, examine school records, and scrutinize the results of surveys administered to parents and students.” Although the interviews provide context, the main focus is on observations of classroom teaching, school leadership, and the school’s capacity to improve.” Kristin Blagg also endorses an “observation system,” which she analogizes to the Quality Rating and Improvement System (QRIS) used to differentiate among early childcare and education providers. Matthew Petersen and his fellow Harvard Graduate School of Education students similarly called for “peer visits.”

Surveys. Ron Ferguson et al. write, “An accountability system should do more than simply measure and reward tested outcomes. Educators need tools and incentives to monitor and manage multiple processes for achieving intended results. Therefore, the state should require the use of valid and reliable observational and survey-based assessment tools. These can provide feedback from students to teachers, and from teachers to administrators, on school climate, teaching quality, and student engagement in learning, as well as the development of agency-related skills and mindsets. For these observational and survey-based metrics, schools should not be graded on the measured scores. Instead, they should be rated on the quality of their efforts to use such measures formatively for the improvement of teaching and learning. Ratings should be provided by officials who supervise principals, contributing 10 percent of a school’s composite accountability score.” Several other proposals, including ones from Jim Dueck and Alex Hernandez, are big on surveys too. Hernandez even turns the results into a “Love” score—as in, “Will my child enjoy and thrive in this school?” Specific tools mentioned include Tripod (developed in 2001 by Ferguson and selected in 2008 for the MET project) and the University of Chicago 5 Essentials Survey, which David Stewart and Joe Siedlecki were high on.

A well-rounded curriculum. Polikoff and colleagues propose a measure that “captures the proportion of students who receive a rich, full curriculum. We define a full curriculum as access to the four core subjects plus the arts and physical education each for a minimum amount of time per week. Our goal with this measure is to ensure that schools do not excessively narrow the curriculum at the cost of non-tested subjects and opportunities for enrichment. This indicator will be verified through random audits.”

Other indictors mentioned here included teacher absenteeism; chronic student absenteeism; and student retention rates (particularly important for communities with lots of school choice). And several proposals (such as the one from the Teach Plus Teaching Fellows, and another from Samantha Semrow and her fellow Harvard classmates) suggest including additional data on schools’ report cards, but not using them to determine school grades. (Melany Stowe’s “measure up” dashboard is a particularly engaging model.) That seems like a smart approach, especially for indicators that are new and perhaps not yet “valid and reliable.” Furthermore, as the Teach Plus Fellows explain, these additional data can be used to “examine whether certain factors have predictive power” for improving student achievement. “In this way, states will have the opportunity to not only identify struggling schools or subgroups but form actionable hypotheses for improving outcomes, using disciplined inquiry to drive improvement.”

Calculating Summative School Grades

ESSA requires states to “establish a system of meaningfully differentiating, on an annual basis, all public schools in the State, which shall be based on all indicators in the State’s accountability system…for all students and for each subgroup of students.” Most of our contenders proposed indices with various weights for their indicators, and typically at least some consideration for the performance of subgroups. But a few offered some outside-the-box ideas:

Mix and match. Dale Chu and Eric Lerum of America Succeeds suggest that states offer schools and districts a menu of indicators to be used to generate their grades. They explain, “This design aspires to create the conditions for flexibility and entrepreneurship at the local level. One of the problems that arose with the previous accountability regime was that it funneled schools toward one model. By allowing schools and districts to develop their own performance goals aligned with their programs, existing performance, and needs of students, ownership of school improvement will lie with the stakeholders closest to the students.”

Inclusion of locally designed indicators. In a similar vein, Jennifer Vranek and her colleagues at Education First write, “Past accountability systems were the darlings of policy makers, think tanks, foundations, editorial boards, and advocates; they rarely had the support of educators, school communities, and the public writ large. They were too often equated with excessive testing that many parents believe ‘takes time away from learning.’ Our design provides school communities the opportunity to select additional indicators and measures in every component through discussion of what matters most to them, to share that publicly, and to commit to work that addresses goals the community develops.”

And a final question for the U.S. Department of Education: Will your regulations allow for the “mix and match” and “locally developed indicators” approaches? Or will you read ESSA as requiring a “single accountability system,” meaning one-size-fits-all for all schools and districts in the state?


Believe me when I say that’s just a sampling of the smorgasbord of sound ideas and fresh thinking available in the two-dozen-plus accountability designs submitted for our competition. I hope others mine the proposals and highlight great suggestions that I missed. And don’t forget to tune in Tuesday to watch our ten finalists make the case for their own unique approach to ESSA accountability.

—Mike Petrilli

This post originally appeared on the Fordham Institute’s Flypaper.

Last Updated


Notify Me When Education Next

Posts a Big Story

Business + Editorial Office

Program on Education Policy and Governance
Harvard Kennedy School
79 JFK Street, Cambridge, MA 02138
Phone (617) 496-5488
Fax (617) 496-4428

For subscription service to the printed journal
Phone (617) 496-5488

Copyright © 2024 President & Fellows of Harvard College