Illustration of a man measuring a school with an oversized ruler

Measuring school performance has been an important component of state and federal policies for two decades. Measures based exclusively on reading and math proficiency have given way to more complex and sophisticated approaches incorporating growth or value-added data as well as indicators of chronic absenteeism, college readiness, or school climate. Pundits and researchers continue to debate whether high-stakes accountability policies have led to improved school performance—and/or to unintended negative side effects.

As I’ve argued before, accountability—with a small “a”—doesn’t require a formal, rules-based system with explicit consequences tied to results. In particular, transparency can create a form of accountability simply by shining a light on performance. Regardless of whether a measure is used for “small a” or “big A” accountability, though, its success in promoting improved outcomes for students depends on whether it provides good diagnostic information, valid and reliable for assessing school performance.

My colleagues and I at the Mid-Atlantic Regional Educational Lab have worked with state and local decision makers to develop, refine, and stress-test measures of school performance that provide useful diagnostic information. Some have been intended as big-A accountability measures, incorporated in state ESSA plans. Others are used for informational purposes, giving decision makers a richer understanding of school performance. Collectively, these projects have highlighted that decision makers need a clear understanding of what a measure is diagnostic for.

A measure can be diagnostic for one purpose and non-diagnostic for another. For example, a low rate of proficiency in grade 3 reading suggests that students need additional support to read proficiently. It does not necessarily mean the school is underperforming in serving its students, because they might be learning rapidly from a very low starting point. Conversely, a high rate of proficiency does not necessarily mean a school is enhancing students’ learning, if they started out as high performing. Assessing whether a school is underperforming requires isolating its contribution from factors outside its control, thereby assessing whether students would do better if they were at a different school.

More broadly, our work has informed a conceptual framework of school performance measures that are potentially diagnostic for addressing three distinct questions:

  1. How are students doing? Measures include not only traditional reading and math proficiency indicators, but also social-emotional learning (such as the “loved, challenged, and prepared” measures we developed with the DC Public Schools) and chronic absenteeism. They also include longer-term student outcomes such as graduation, college enrollment, degree completion, workforce participation, and even civic participation—the original public purpose of public education. Broad, rich, and robust measures of how students are doing are important for helping decision makers identify differences in student needs across schools. For diagnostic purposes, decision makers—including parents choosing among schools—must recognize that student outcomes are affected by both schools and factors outside the school’s control. If valid and reliable, these measures diagnose student need but may not reveal a school’s effectiveness in promoting the outcomes.
  2. What does the school contribute to student outcomes? Every school serves a unique set of students with different supports and experiences outside of the classroom. Identifying a school’s contribution to its students’ outcomes requires accounting for out-of-school factors. For test scores, value-added measures or schoolwide aggregates of students’ achievement growth over time (such as median student growth percentiles) accomplish this task. In most states, these measures involve only a narrow range of student outcomes—typically reading and math test scores in grades 4–8. But statistical adjustments to measure school contributions can also be applied to any student outcomes. For example, we partnered with Maryland to develop the nation’s first school-level measures of student growth encompassing grades K–3 (and examine their validity and reliability). And our newest report describes the measures of “promotion power” we developed in partnership with the District of Columbia, assessing the contribution of each DC high school to the probability that its students will graduate, demonstrate college readiness, and enroll in college. Similarly, Mathematica has worked with Louisiana to develop promotion power measures that extend even further, to include each high school’s impact on employment and earnings of students in their 20s. These efforts demonstrate that it is possible to apply adjustments to almost any measure of how students are doing (including, for example, social-emotional learning) to better assess what a school contributes to that outcome.[1]
  3. What happens inside the school? Decision makers equipped with rich, broad, and valid ways to measure student outcomes and school contributions can identify high- and low-performing schools, but even the best measures of student outcomes and school contributions provide no information about how or why a school is high or low performing. To provide data on the internal processes that might drive school performance, REL Mid-Atlantic partnered with Pennsylvania, Maryland, and DC Public Schools to develop, analyze, and interpret measures of school climate based on staff and student surveys. Our analyses have confirmed that school climate survey measures can be sufficiently reliable to distinguish schools from one another—and even that they can provide a useful window on the performance of school leaders. Beyond climate surveys, other indicators of what is happening in a school might include measures of exclusionary discipline (such as suspensions), measures of teaching quality, and even reports from observational inspections like those that are often used in schools in Europe.

Whether any of these measures is appropriate for use in high-stakes accountability systems requires serious consideration of potential trade-offs and unintended consequences. But even if they are not used for high-stakes purposes, all of these measures can helpfully inform policymakers and parents—as long as they understand what they are diagnostic for. Our framework can help stakeholders developing and using measures understand their value for specific purposes and their limitations.

1. The resulting measures may not be perfect indicators of school contributions, but they will be much closer to identifying a school’s contribution than would a raw measure of the outcome such as a proficiency rate or graduation rate. Studies have confirmed that value-added measures can provide valid information about performance (see Chetty et al. 2014 and Bacher-Hicks et al. 2019).

Brian P. Gill is a senior fellow at Mathematica Policy Research, Inc.

This post originally appeared on REL Mid-Atlantic’s RELevant blog

Last updated September 30, 2021