How Should States Design School Rating Systems? A Conversation with an Expert

Under the newly enacted Every Student Succeeds Act, all states will be responsible for designing their own statewide accountability systems. Although there are some federal parameters on what and how measures must be included in those systems, states have considerable latitude in how they go about creating accountability systems that work for them.

In order to learn more about what states should think about in this process, I reached out to Christy Hovanetz, a Senior Policy Fellow for the Foundation for Excellence in Education. Dr. Hovanetz served as the Assistant Commissioner at the Minnesota Department of Education and Assistant Deputy Commissioner at the Florida Department of Education. Since leaving public service, Dr. Hovanetz has worked with a number of states on their accountability systems, and has established herself as one of the nation’s leading experts on school rating systems. What follows is a lightly edited transcript of our conversation.

Aldeman: Why do you do this work? Why should states spend time, money, and political capital creating accountability systems for schools?

Hovanetz: It’s very important that parents can access information about whether or not their child’s school is effective, and whether or not it’s the best fit for their child. We also spend a considerable amount of taxpayer funding on our schools, and we should be sure we’re getting a good return on our investment and our schools are actually effective in teaching students the skills they need to be successful members of society.

It’s important work to be done, because what gets measured gets done, and we have to be sure all students have opportunities to reach their full potential.

You’ve worked with a number of states designing their accountability systems. Can you talk about the main lessons you’ve learned along the way?

The reason we’ve done a lot of work with states in developing their accountability systems is to be sure there’s a transparent way to report information that people understand and can use to improve student outcomes. The whole goal of accountability systems is to make sure that students are learning.

As we work with states in developing these systems, one of the key components is making sure the information is translatable for parents, that they can understand what percentage of students in that school who are mastering standards and achieving grade-level expectations and whether or not those students are going to be ready to graduate from high school and be successful in college.

As we work with states, we want to make sure they are providing information to parents and the public as to whether or not students will be successful once they leave the K-12 system.

I’ve written about this as well, but from your perspective, what’s wrong with “dashboard” type systems that simply describe performance on a range of metrics rather than trying to categorize schools into some summative rating?

Dashboard systems are very important and valuable because they provide a lot of information about a school. However, with a dashboard full of information comes the challenge of summarizing all the data for parents to know if a school is effective or not. That initial rating, particularly school letter grades of A, B, C, D, and F, which are used by 16 states across the country, really draws the attention to the overall effectiveness of a school. Once that overall impression is made, going in and digging into the data and looking at a dashboard of measures will reveal what happened in that school to earn that particular letter grade.

Being able to draw in parents, the public, policymakers, and others who are interested in education, we need something to be able to say, “This particular school is high-performing or not a high-performing school,” and then provide additional information that supports that letter grade.

How would a state know if its accountability system is working well? What would it look for?

We encourage states when they’re developing their systems and adjusting their systems, to monitor them against other external measures. One of the leading indicators we use is whether a state’s National Assessment of Educational Progress (NAEP) scores are improving, which is the only commonly administered assessment across the country that also allows comparisons for longitudinal purposes. We’d also encourage states to look at the success of students in taking and passing college placement exams. For states that track other pieces of information beyond the high school level, we’d encourage them to look at college retention rates, degree-earning, and whether or not graduates are earning living wages. In evaluating accountability systems, we focus on those outcomes that are external to the accountability system itself.

Are there any key lessons about what not to do? What should states avoid when they design accountability systems?

The first thing when you’re designing an accountability system, first and foremost, is identifying what’s important to you. A lot of states have determined what’s most important in an accountability system is whether students can read and compute, and whether students are making progress from year to year. Once you have the fundamentals of the accountability system, you can build from there.

Some states might be inclined to try to accommodate every single wish or desire of all stakeholders in a state, including things that may not be as important as whether or not kids are learning to do math or learning to read. Including those extra measures can dilute those really important things that students need to learn in school. States need to think about what’s really important and limit the measures to ones that determine if those important things are being met or not. Too many states try to include too many measures into their accountability system, and then none of the individual measures are really important or really guide schools on what their learning outcomes need to be.

Can you talk more about this balance between the desire for multiple measures versus the desire for simplicity? How should states think about this tension?

States should start by acknowledging there’s a limit to what they can put in their accountability system. If your state really values whether or not students are proficient in reading and math, there should be a really strong focus on reading and math in your accountability system.

There is a strong desire to expand beyond just academic indicators—including a measure of growth is very important—but including things that are not direct learning outcomes and focus more on environment and other input measures blurs the vision on what we want students to know and be able to do. All of those things support a strong learning environment, and will indirectly will lead to success, but do not in themselves measure success. It’s trying to balance what’s important and what we want from student outcomes versus what it takes to put those conditions in place. Including too many things in the system complicates it and reduces the importance of student outcomes that we’re really looking for.

You mentioned the concern about dilution of having too many measures. Can you also talk about it in terms of subgroups? How can we ensure subgroups of students are a meaningful part of state rating systems?

We encourage states to focus on the lowest-performing students, but the lowest-performing students aren’t always part of a particular racial or economic group, or even a particular curricular subgroup. By focusing on the lowest-performing students, we want to create a system that truly focuses on students who need the most help and is equitable across all schools. We strongly support the focus on the lowest-performing students no matter what group they come from.

That does a number of things. It reduces the number of components that have to be focused on within the accountability system, and places the focus on students who truly need the most help, rather than focusing on students in particular subgroups, for which they can’t necessarily control their placement. It also reduces the need for small n-sizes. If you’re looking at the lowest-performing students in any given school, it’s a larger n-size than a lot of the race or curricular subgroups.

We still strongly support reporting out scores by particular subgroups to identify large achievement gaps, but focusing on the lowest-performing students ensures that students who need the most help are getting it.

I also want to get your perspective on the Every Student Succeeds Act. It has a lot of promise for states, but it also has a lot risks. Can you talk about what are your biggest concerns about the law going forward and how it might be implemented?

There are differing perspectives on how to operationalize the law, but I’m mainly concerned that we may get very complex systems that won’t be transparent for parents, the public, and policymakers to use information in an easy way.

We’ve looked at some of the different interpretations of how states might be required to implement the law, and one of them goes very much back to the Adequate Yearly Progress, or “AYP,” measure from NCLB. What that looks like is schools might have 10 different subgroups that all have to meet the minimum n-size for each of the components in the system. That creates a very complex system very quickly, because you’re talking about 10 subgroups for each indicator that’s included in the system. This interpretation of the law requires a minimum of 8 different indicators (math achievement scores, reading achievement scores, another academic indicator, and a school quality or student success indicator, plus participation rate for each of these four measures). Eight indicators multiplied by 10 subgroups would create a system with 80 unique cells for each school.

That would really dilute the focus on any particular component or any particular low-performing student. It gets very complex very quickly. And that’s just the minimum number. If states add in additional components, like extended graduation rates, other school quality or student success indicators, additional subjects or assessments, they might quickly double the number of cells. They’d go back to the conjunctive model of AYP where they’re rating schools on a series of yes or no responses as to whether or not schools met each criteria. It also re-creates a pretty inequitable system, given that each school will have different groups of students that meet minimum n-size that will be held accountable under each component.

We hope that states are given flexibility to meet the requirements of the law in different ways. For example, states have shown progress in supporting student achievement by focusing on the lowest-performing students. We’ve seen some interpretations of ESSA that would allow that going forward, and we hope those states don’t have to backtrack and go away from systems that are producing positive results.

You mentioned the 80 cells that schools will have to look at once you factor in each subgroup on each indicator. Are there other ways that states might implement this? Are there other ways to accomplish the same goals?

One example is in participation rate. We strongly agree with the law’s requirement that 95 percent of students be tested. However, we would institute that not as a component within the system, but instead if a school did not meet the 95 percent requirement for all students and each subgroup, their school grade would be lowered one letter grade. That would take potentially 40 cells out of the accountability system.

With respect to growth, we expect that states can continue focusing on the lowest-performing students. Rather than demonstrating by each individual subgroup what percent of students are making growth, they’d be demonstrating whether the lowest-performing students are making growth, in addition to all students in that school.

Being able to focus in on the lowest-performing students and also fold in participation more seamlessly would reduce the complexity and also achieve the overall intent and spirit of the law in terms of making sure all students are achieving and growing.

The last question I have is for states that are still using accountability systems that look a lot like NCLB. What advice would you give those states as they think about this work going forward for the next few years?

States should start by focusing on what their priorities are, where their student scores have been improving or not, and using those as the basis for their new accountability system. They should make sure to create a system that is equitable and levels the playing field across all schools. They should not create a situation where some schools are accountable for 25 things and other schools are only accountable for five things.

Most importantly, is the accountability system designed in the best interest of student learning? As states consider a particular indicator or component, will student outcomes actually improve because of that component? We see a lot of components with perverse incentives where states might think they’re doing the right thing but could actually cause harm. For example, including a school safety measure that looks at the percent of expulsions at a school might force educators into creating a dangerous school environment because expelling or suspending students would affect the school’s accountability designation.

For all indicators in their accountability system, state leaders should ask, “what do these components measure and how will they improve student outcomes?” That requires states to step back and look at identifying the most important things for their education systems to accomplish. They should create systems without perverse incentives so schools and districts are actually focusing on the things that are important.

—Chad Aldeman

This post originally appeared on Ahead of the Heard.

NEWSLETTER

Business + Editorial Office

Discover

More Information

How Should States Design School Rating Systems? A Conversation with an Expert

Latest Issue

NEWSLETTER

Business + Editorial Office

Discover

More Information