Teachers and parents have struggled to keep children on course over the past year, but the extended school closures have clearly taken their toll on learning. In light of these extraordinary circumstances, we have four recommendations for the U.S. Department of Education regarding state testing: 1) Provide states with as much flexibility as lawfully permissible to reduce or eliminate school-accountability determinations in the current school year; 2) do not require states to administer tests remotely, because these approaches threaten the valid interpretation of testing data; 3) do not require states to administer any statewide tests unless almost all students have been learning in school for at least a month prior to testing; 4) even if states are able to administer tests in something close to a typical manner, we urge federal flexibility in requiring testing because of likely unintended negative consequences and because desired information on student learning needs can be gathered in other, less costly and intrusive ways. Here we present the reasoning behind our recommendations.
Among those who support testing in spring 2021, a limited number continue to push for the results to be linked to school accountability. As Jessica Baghian notes in her companion essay, though, most pro-testing advocates understand that it makes no sense to hold schools accountable for outcomes beyond the schools’ control. Student performance in the spring will hinge as much upon digital access and home learning environments as it will on the efforts of educators. No amount of statistical adjustment can disentangle school performance from the cumulative and uneven effects of the pandemic on instruction and learning. Even if a state decides to forgo accountability consequences this year, it is likely that, at least in some quarters, schools and teachers will be blamed for poor tests results.
Statewide Summative Assessment
Many advocacy groups and policy leaders are urging that statewide testing take place this spring because “the data are critically needed.” There are competing ideas, however, about how the data will be used and whether these uses are technically defensible. There are also conflicting ideas about how tests should be administered, given that the Covid-19 crisis is likely to persist through the spring. These dilemmas come down to three main questions:
- Can statewide testing produce results that are trustworthy and useful this year?
- Even if states are able to administer tests to essentially all students, is testing the best use of resources to gauge learning progress during the pandemic?
- What should state leaders do to understand how best to address the major learning challenges and inequities exacerbated by the pandemic?
Test Validity and Usefulness
If state tests are to serve public-reporting or accountability purposes, they must be administered under standardized conditions that will allow officials to make valid inferences from the results. It is unlikely that all students will have returned to in-person schooling by the time of the usual testing windows in March through May. Therefore, states are left with two unsatisfying options: they can either require students to come into school buildings to take the tests on a schedule that supports social distancing, or they can administer the assessments remotely.
State and district leaders would have to tie themselves in moral knots to require students to come into schools to take the tests when they are not permitting students to enter buildings for learning. Any testing protocols that call for special cooperation from families will likely invite parents to revolt, with many keeping their children at home out of concern for their health. The recent controversy over requiring English learners to come into school to take their English language proficiency exams is evidence of this backlash. A recent survey by researchers at the University of Southern California found that 64 percent of parents support cancelling standardized tests for spring 2021.
Many certification exams for adults are administered under strict remote proctoring conditions, but this controlled situation is quite different from simply administering tests remotely. In the case of K–12 education, the remote-proctoring requirements necessary to ensure secure test administration would violate most states’ student-privacy rules. Early results from the major interim assessment providers (for example, Curriculum Associates, Northwest Evaluation Association, and Renaissance Learning) also suggest some questionable performance patterns under remote learning and assessment conditions, such as higher-than-usual scores for early-elementary and middle-school students. This phenomenon is likely attributable to parents helping younger students and older students using resources typically not permitted on the tests. Posing even greater concerns, though, are the significant equity issues associated with remote testing, such as bandwidth capacity, device availability, and the varied settings in which students will test (for example, in a private, quiet space versus sharing a kitchen table with siblings engaged in their own tests or lessons). Finally, most technical experts doubt that test scores from in-person testing and remote testing can be combined as if they were equivalent. Essentially all state assessment directors recently indicated their states are not planning to administer tests remotely. That would appear to leave in-person administration as the only option if testing is to proceed.
Some have suggested administering the state summative test in the fall, when we hope essentially all students will be back in school. However, tests are designed and validated for specific purposes and uses. End-of-year state summative tests are designed to evaluate the degree to which students have learned the knowledge and skills for the grade or subject they just completed. While the tests from the previous year could be administered in the fall, it would make no sense to do so. Such testing would take time and would not confer any instructional benefit. State tests cover an entire year’s worth of content, but at best teachers would only be able to respond to one or two curricular units, which would not be the same for all students. Further, because tests would be administered at a different time of year, it would be difficult to compare the results to prior years’ scores for the district as a whole.
Assessing Pandemic-Related Learning Needs
On balance, we believe the challenges of testing in 2021 outweigh any potential benefits. Even in the unlikely event that essentially all students are back in school early this spring, we do not think states should be required to administer the statewide assessment as if this were a typical school year.
State assessments cost a lot of money and time—generally worth the benefit in normal years. However, the challenges associated with appropriately interpreting test results this year shift the equation. Interpretation of individual test scores will be challenging enough, but interpreting aggregate scores (for example, by school or subgroup), with shifting participation rates from 2019, will be almost impossible. An even more serious concern is that devoting time to standardized testing will mean a loss of precious instructional time, leading to a considerable opportunity cost if students have only returned to in-school learning a few weeks prior to testing.
Addressing Learning Challenges and Inequities
We recognize that documenting the impact of Covid-19 on student learning is one of the main arguments for testing this year. But a generic claim about “equity” being the reason for testing does not ensure that the best data will be gathered and used to support that stated purpose. Because 2021 state test data will only be an approximation (owing to a reduced pool of test-takers and non-comparable administration conditions), other data sources could be just as good or better, depending on the intended use.
For instance, federal and state policymakers may want large-scale test data so they can estimate how much children have learned, which they think will in turn motivate investment in structural interventions such as summer school and one-on-one tutoring (a tenuous assumption). If this is the intended purpose, then aggregation of already administered interim assessment data—exemplified by recent Renaissance Learning and NWEA studies—would meet that goal. Policymakers should capitalize on the interim tests already being administered this year, because we doubt that state testing will yield incrementally more-useful information, given the obstacles to obtaining valid results. (This does not mean we support replacing state tests with multiple-choice interim assessments after the pandemic has passed.)
If the goal is to allocate additional resources to the students and schools that suffered the greatest inequities during the school shutdowns, then education leaders could obtain more-direct information through measures of “opportunity-to-learn.” While opportunity-to-learn usually refers to high-quality indicators such as challenging curriculum, well prepared teachers, and the like, in Covid-19 circumstances, students with the gravest learning needs are those who lacked device and Internet access, who experienced the greatest proportion of remote-learning time, or who suffered extensive absences due to family circumstances. Districts already have data on most of these factors.
If policymakers are intent on gathering data on as broad a sample of the state’s students as possible, they might consider using a reduced-testing design, such as using a sample (subset) of test questions for each student (as the National Assessment of Educational Progress does), testing a sample of students from all grades, or testing as many students as possible from selected grades (for example, grades 4, 8, and 11). Several of these alternatives would require flexibility from the U.S. Department of Education. Employing any of them is practical if the state and its assessment provider have already engaged in substantial redesign work and only if essentially all students can test in schools in somewhat normal conditions.
End-of-year state tests have never provided instructionally useful information for individual students. Knowing that a student is performing below proficiency does not provide any substantive information about what a student does or does not understand. Assessments embedded in a school’s current high-quality curriculum are the best tools for teachers in planning instruction and sharing information with parents. Districts that do not have such assessments in place could identify key assignments that reflect grade-level expectations and could share examples of student work to help parents understand how their students are performing relative to standards. Some might wonder if the same concerns about opportunity-to-learn and equity apply to these curriculum-embedded assessments. They do not. Curriculum-embedded assessments can be given under non-standardized conditions on a unit-by-unit basis so teachers can respond instructionally to individual student needs before moving on to the next unit. This is very different from state tests that cover a year’s worth of curriculum all at once. Further, teachers are close enough to their students to be able to understand the nuances and context of the assessment results.
In sum, given the uncertainty around vaccine distribution and the current explosion of Covid cases, we recommend that the U.S. Department of Education provide considerable flexibility to states regarding summative-assessment requirements this year unless essentially all students are able to test in school and have been learning in school for some time prior to testing. We are not against testing, in fact, quite the opposite, but we already have enough evidence that the pandemic interruptions have taken a huge toll on learning, especially for poor children and children of color. Rather than arguing about testing, we urge devoting energy and money to substantial instructional opportunities during the summer, such as extended summer-school offerings and other significant interventions. The learning shortfalls already being reported are too serious to address via the usual tinkering around the edges.