## Simpson’s Paradox Hides NAEP Gains (Again)

Is the education space mature enough to handle NAEP tests every two or four years? I’m not so sure.

NAEP is the “Nation’s Report Card.” It takes a representative sample of United States students and tests them in reading, math, social studies, science, the arts, etc. There are several versions of NAEP intended to sample different groups of students–the nation as a whole, individual states, or large cities–but its overall goal is to provide citizens a snapshot of how we’re doing as a country.

The United States is a big country, and it takes a long time to move the needle on student achievement scores. Depending on the subject and sample, NAEP releases test results every two or every four years. When those scores come out, they almost always look flat. Once this same “flat” result gets repeated over and over, that starts to seep into our collective consciousness about how American students are doing.

But that’s the wrong way to look at it. From a long-term perspective, the achievement levels of American students are at or near all-time highs. Some groups of students are doing particularly well. The achievement scores of black, Hispanic, and low-income students have increased dramatically.

Because NAEP takes a representative sample, it’s also vulnerable to something called Simpson’s Paradox, a mathematical paradox in which the composition of a group can create a misleading overall trend. As the United States population has become more diverse, a representative sample picks up more and more minority students, who tend to score lower overall than white students. That tends to make our overall scores appear flat, even as all of the groups that make up the overall score improve markedly.

Recent NAEP results in history, geography, and civics illustrate this trend once again. Education Week reported that scores were “flat” from 2010 to 2014. That’s mostly true–the scores were all higher than in 2010 but didn’t meet the standard for statistical significance. But scores are up over longer periods of time. Here are the gains since 2001 on geography (* signifies statistically significant):

• All students: +1
• White students: +4*
• Black students: +7*
• Hispanic students: +9*
• Students with disabilities: +8*
• English Language Learners: +7

Here are the gains since 2001 on history:

• All students: +7*
• White students: +9*
• Black students: +11*
• Hispanic students: +17*
• Students with disabilities: +15*
• English Language Learners: +12*

And here are the gains since 1998 on civics (civics has a slightly longer time period of comparable data):

• All students: +3*
• White students: +6*
• Black students: +6
• Hispanic students: +14*
• Students with disabilities: +13*
• English Language Learners: +14*

A few things jump out from these longer-term results. First, overall scores are up a little bit, but particular groups of students are making big gains. One rule of thumb suggests that 10-15 points on the NAEP translates into one grade level. Applying that here, scores for most groups of students have improved by roughly a full grade level over the last 15 years or so. Second, achievement gaps are closing as lower-performing groups are catching up to higher-performing ones. Third, Simpson’s Paradox makes the overall scores look relatively “flat.” Don’t let that mislead you. Although we might wish for faster progress, American achievement scores are rising.

This first appeared on Ahead of the Heard

